Building https://github.com/mthorrell/gbnet
- Unchecked, cursor would have deleted all code calculating gradients and hessians, breaking everything.
- The first version completely skipped the obvious get/set_extra_state PyTorch methods, creating problems. Possibly these are newer methods?
- Unchecked, cursor would have deleted all code calculating gradients and hessians, breaking everything.
- The first version completely skipped the obvious get/set_extra_state PyTorch methods, creating problems. Possibly these are newer methods?
Based on questions from the audience, I released an update to GBNet that enables saving a loading models to disk.
One step closer to V1.0
Based on questions from the audience, I released an update to GBNet that enables saving a loading models to disk.
One step closer to V1.0
Don't miss @horrellmt.bsky.social at #SciPy2025 for "GBNet: Gradient Boosting packages integrated into PyTorch." 🧠⚙️
Discover how GBNet bridges powerful boosting methods with deep learning workflows.
🔗 scipy2025.scipy.org
Don't miss @horrellmt.bsky.social at #SciPy2025 for "GBNet: Gradient Boosting packages integrated into PyTorch." 🧠⚙️
Discover how GBNet bridges powerful boosting methods with deep learning workflows.
🔗 scipy2025.scipy.org
🎙️ Talk: “GBNet: Gradient Boosting Packages Integrated into PyTorch”
🗓️ Wed July 9, 11:25 AM (Room 315)
I'll be speaking about GBNet, an Open Source package I maintain.
🎙️ Talk: “GBNet: Gradient Boosting Packages Integrated into PyTorch”
🗓️ Wed July 9, 11:25 AM (Room 315)
I'll be speaking about GBNet, an Open Source package I maintain.
Eg: When finding a mean, how much to weight the prior (call it Model 1) vs the sample average (Model 2)? Some bayesian math gives the optimal answer given # of obs.
A bit of GBNet & PyTorch empirically derives the same answer almost exactly.
Eg: When finding a mean, how much to weight the prior (call it Model 1) vs the sample average (Model 2)? Some bayesian math gives the optimal answer given # of obs.
A bit of GBNet & PyTorch empirically derives the same answer almost exactly.
Situation: you have several models with predictions
Q: Is there a data-driven way to combine them? And, for convenience, can I use XGBoost to do find the right averaging coefficients?
Situation: you have several models with predictions
Q: Is there a data-driven way to combine them? And, for convenience, can I use XGBoost to do find the right averaging coefficients?
Categorical splitting is one interesting feature to play with using GBNet. To scratch the surface, I fit a basic Word2Vec model using XGBoost for categorical splitting.
Categorical splitting is one interesting feature to play with using GBNet. To scratch the surface, I fit a basic Word2Vec model using XGBoost for categorical splitting.
Beyond some usability improvements, the uncertainty estimation for the Forecasting module got merged in. Now GBNet forecasting is:
✅ Faster
✅ More accurate than Prophet
✅ Provides uncertainty estimates
✅ Supports changepoints
Beyond some usability improvements, the uncertainty estimation for the Forecasting module got merged in. Now GBNet forecasting is:
✅ Faster
✅ More accurate than Prophet
✅ Provides uncertainty estimates
✅ Supports changepoints
Adding conf intervals for gbnet forecasting module, I can do train/validation holdout for this and still be 3-4X faster.
Trying to get 80% test coverage:
New method: 76% avg
Prophet: 55% avg
Adding conf intervals for gbnet forecasting module, I can do train/validation holdout for this and still be 3-4X faster.
Trying to get 80% test coverage:
New method: 76% avg
Prophet: 55% avg
I asked GBNet for a second prediction output. I slapped on torch.nn.GaussianNLLLoss and out comes a variance estimate that is well calibrated.
I asked GBNet for a second prediction output. I slapped on torch.nn.GaussianNLLLoss and out comes a variance estimate that is well calibrated.
Default forecast performance improved by 20% and achieved a 5X speedup. Using the random training, random horizon benchmark, now 9 of 9 example datasets have better performance with GBNet compared to Prophet.
Default forecast performance improved by 20% and achieved a 5X speedup. Using the random training, random horizon benchmark, now 9 of 9 example datasets have better performance with GBNet compared to Prophet.
Most promising method so far (see plot) asks GBDT to fit and find the changepoints. Another cool application of GBNet (see equation).
Most promising method so far (see plot) asks GBDT to fit and find the changepoints. Another cool application of GBNet (see equation).
Unfortunately it's back to the drawing board. PyTorch changepoints fit too slowly and GBLinear, I now realize, can't actually turn them off. It does work though!
Unfortunately it's back to the drawing board. PyTorch changepoints fit too slowly and GBLinear, I now realize, can't actually turn them off. It does work though!
I wonder how much the LLMs will be able to one-shot it. According to the Prophet paper it's just a broken stick regression with lasso penalty.
I wonder how much the LLMs will be able to one-shot it. According to the Prophet paper it's just a broken stick regression with lasso penalty.
New features:
- ReadTheDocs website
- GBLinear
- GBLinear integration into forecasting model (10X faster, 10% improvement in predictions)
- Python class refactoring
New features:
- ReadTheDocs website
- GBLinear
- GBLinear integration into forecasting model (10X faster, 10% improvement in predictions)
- Python class refactoring
I replaced PyTorch Linear with GBLinear (removing batchnorm) and...
1. improved accuracy (see table)
2. sped up fitting 10X by using fewer training rounds
3. improved worst-case performance (see plot)
I replaced PyTorch Linear with GBLinear (removing batchnorm) and...
1. improved accuracy (see table)
2. sped up fitting 10X by using fewer training rounds
3. improved worst-case performance (see plot)
Why do this?
Why do this?
Turned out great for my use-case because I was using a pure least-squares solver but I wanted Ridge Regression.
Turned out great for my use-case because I was using a pure least-squares solver but I wanted Ridge Regression.
Nice to see a deep learning project that is easy to just jump into and start using.
Nice to see a deep learning project that is easy to just jump into and start using.
Ord. loss is complex and has viable alternatives. But, on a set of 19 Ordinal Datasets, Ord. Reg. using LightGBM came out on top. Maybe worth keeping in mind if your data is Ordinal.
Ord. loss is complex and has viable alternatives. But, on a set of 19 Ordinal Datasets, Ord. Reg. using LightGBM came out on top. Maybe worth keeping in mind if your data is Ordinal.
Performance-wise, the LightGBM backend wins overall.
Over 500 trials:
LightGBM wins 236 (47%)
XGBoost wins 153 (31%)
Prophet wins 111 (22%)
Performance-wise, the LightGBM backend wins overall.
Over 500 trials:
LightGBM wins 236 (47%)
XGBoost wins 153 (31%)
Prophet wins 111 (22%)
To intuit this, consider the direction defined by the 2D columns of Beta (a 2 x 10 matrix) as the specific classification directions.
Plots are below. Which is the contrastive embedding? Which is the classification embedding?
To intuit this, consider the direction defined by the 2D columns of Beta (a 2 x 10 matrix) as the specific classification directions.
Plots are below. Which is the contrastive embedding? Which is the classification embedding?
Plots are below. Which is the contrastive embedding? Which is the classification embedding?