Building https://github.com/mthorrell/gbnet
Eg: When finding a mean, how much to weight the prior (call it Model 1) vs the sample average (Model 2)? Some bayesian math gives the optimal answer given # of obs.
A bit of GBNet & PyTorch empirically derives the same answer almost exactly.
Eg: When finding a mean, how much to weight the prior (call it Model 1) vs the sample average (Model 2)? Some bayesian math gives the optimal answer given # of obs.
A bit of GBNet & PyTorch empirically derives the same answer almost exactly.
Situation: you have several models with predictions
Q: Is there a data-driven way to combine them? And, for convenience, can I use XGBoost to do find the right averaging coefficients?
Situation: you have several models with predictions
Q: Is there a data-driven way to combine them? And, for convenience, can I use XGBoost to do find the right averaging coefficients?
Categorical splitting is one interesting feature to play with using GBNet. To scratch the surface, I fit a basic Word2Vec model using XGBoost for categorical splitting.
Categorical splitting is one interesting feature to play with using GBNet. To scratch the surface, I fit a basic Word2Vec model using XGBoost for categorical splitting.
Beyond some usability improvements, the uncertainty estimation for the Forecasting module got merged in. Now GBNet forecasting is:
✅ Faster
✅ More accurate than Prophet
✅ Provides uncertainty estimates
✅ Supports changepoints
Beyond some usability improvements, the uncertainty estimation for the Forecasting module got merged in. Now GBNet forecasting is:
✅ Faster
✅ More accurate than Prophet
✅ Provides uncertainty estimates
✅ Supports changepoints
Adding conf intervals for gbnet forecasting module, I can do train/validation holdout for this and still be 3-4X faster.
Trying to get 80% test coverage:
New method: 76% avg
Prophet: 55% avg
Adding conf intervals for gbnet forecasting module, I can do train/validation holdout for this and still be 3-4X faster.
Trying to get 80% test coverage:
New method: 76% avg
Prophet: 55% avg
I asked GBNet for a second prediction output. I slapped on torch.nn.GaussianNLLLoss and out comes a variance estimate that is well calibrated.
I asked GBNet for a second prediction output. I slapped on torch.nn.GaussianNLLLoss and out comes a variance estimate that is well calibrated.
Default forecast performance improved by 20% and achieved a 5X speedup. Using the random training, random horizon benchmark, now 9 of 9 example datasets have better performance with GBNet compared to Prophet.
Default forecast performance improved by 20% and achieved a 5X speedup. Using the random training, random horizon benchmark, now 9 of 9 example datasets have better performance with GBNet compared to Prophet.
Most promising method so far (see plot) asks GBDT to fit and find the changepoints. Another cool application of GBNet (see equation).
Most promising method so far (see plot) asks GBDT to fit and find the changepoints. Another cool application of GBNet (see equation).
Unfortunately it's back to the drawing board. PyTorch changepoints fit too slowly and GBLinear, I now realize, can't actually turn them off. It does work though!
Unfortunately it's back to the drawing board. PyTorch changepoints fit too slowly and GBLinear, I now realize, can't actually turn them off. It does work though!
I replaced PyTorch Linear with GBLinear (removing batchnorm) and...
1. improved accuracy (see table)
2. sped up fitting 10X by using fewer training rounds
3. improved worst-case performance (see plot)
I replaced PyTorch Linear with GBLinear (removing batchnorm) and...
1. improved accuracy (see table)
2. sped up fitting 10X by using fewer training rounds
3. improved worst-case performance (see plot)
Training curves on the left (note the log scales). A sample from the dataset on the right... far from a crazy dataset.
Training curves on the left (note the log scales). A sample from the dataset on the right... far from a crazy dataset.
Turned out great for my use-case because I was using a pure least-squares solver but I wanted Ridge Regression.
Turned out great for my use-case because I was using a pure least-squares solver but I wanted Ridge Regression.
Ord. loss is complex and has viable alternatives. But, on a set of 19 Ordinal Datasets, Ord. Reg. using LightGBM came out on top. Maybe worth keeping in mind if your data is Ordinal.
Ord. loss is complex and has viable alternatives. But, on a set of 19 Ordinal Datasets, Ord. Reg. using LightGBM came out on top. Maybe worth keeping in mind if your data is Ordinal.
Plots are below. Which is the contrastive embedding? Which is the classification embedding?
Plots are below. Which is the contrastive embedding? Which is the classification embedding?