Lightnews — Scholar-powered news

Michael Horrell

@horrellmt.bsky.social

Some pitfalls in vibe-coding seen along the way:

- Unchecked, cursor would have deleted all code calculating gradients and hessians, breaking everything.

- The first version completely skipped the obvious get/set_extra_state PyTorch methods, creating problems. Possibly these are newer methods?

Michael Horrell @horrellmt.bsky.social · Jul 21

Credit to PyTorch, XGB and LightGBM btw. This was surprisingly easy to accomplish: (1) serialize GBMs to string (2) store the string in PyTorch state_dict.

July 21, 2025 at 3:22 AM

Michael Horrell

@horrellmt.bsky.social

Speaking at @scipyconf.bsky.social was a blast!

Based on questions from the audience, I released an update to GBNet that enables saving a loading models to disk.

One step closer to V1.0

July 21, 2025 at 3:16 AM

Reposted by Michael Horrell

Journal of Open Source Software

@joss-openjournals.bsky.social

Just published in JOSS: 'GBNet: Gradient Boosting packages integrated into PyTorch' https://doi.org/10.21105/joss.08047

July 7, 2025 at 2:31 PM

Reposted by Michael Horrell

scipyconf.bsky.social

@scipyconf.bsky.social

🔥 Speaker Spotlight: Michael Horrell

Don't miss @horrellmt.bsky.social at #SciPy2025 for "GBNet: Gradient Boosting packages integrated into PyTorch." 🧠⚙️

Discover how GBNet bridges powerful boosting methods with deep learning workflows.

🔗 scipy2025.scipy.org

July 6, 2025 at 3:12 PM

Michael Horrell

@horrellmt.bsky.social

🚀 #SciPy2025 is just under two weeks away, and I’ll be there!

🎙️ Talk: “GBNet: Gradient Boosting Packages Integrated into PyTorch”
🗓️ Wed July 9, 11:25 AM (Room 315)

I'll be speaking about GBNet, an Open Source package I maintain.

June 24, 2025 at 2:21 PM

Michael Horrell

@horrellmt.bsky.social

Bayesian modeling provides a nice test bed for this

Eg: When finding a mean, how much to weight the prior (call it Model 1) vs the sample average (Model 2)? Some bayesian math gives the optimal answer given # of obs.

A bit of GBNet & PyTorch empirically derives the same answer almost exactly.

June 16, 2025 at 2:54 AM

Michael Horrell

@horrellmt.bsky.social

Next application for GBNet is basically a data-aware model averaging or a mixture of experts type of analysis.

Situation: you have several models with predictions

Q: Is there a data-driven way to combine them? And, for convenience, can I use XGBoost to do find the right averaging coefficients?

June 14, 2025 at 5:12 PM

Michael Horrell

@horrellmt.bsky.social

GBNet calls XGBoost/LightGBM under the hood. This means you can bring native XGB/LGBM features to PyTorch with little effort.

Categorical splitting is one interesting feature to play with using GBNet. To scratch the surface, I fit a basic Word2Vec model using XGBoost for categorical splitting.

May 26, 2025 at 2:48 PM

Michael Horrell

@horrellmt.bsky.social

Just released GBNet V0.5.0

Beyond some usability improvements, the uncertainty estimation for the Forecasting module got merged in. Now GBNet forecasting is:

✅ Faster
✅ More accurate than Prophet
✅ Provides uncertainty estimates
✅ Supports changepoints

May 22, 2025 at 1:29 AM

Michael Horrell

@horrellmt.bsky.social

One benefit to speeding up model fitting code 5X is that you can use that saved time for other things.

Adding conf intervals for gbnet forecasting module, I can do train/validation holdout for this and still be 3-4X faster.

Trying to get 80% test coverage:
New method: 76% avg
Prophet: 55% avg

April 28, 2025 at 12:27 AM

Michael Horrell

@horrellmt.bsky.social

So it's continually a nice surprise that stuff like this kinda just works.

I asked GBNet for a second prediction output. I slapped on torch.nn.GaussianNLLLoss and out comes a variance estimate that is well calibrated.

April 13, 2025 at 12:30 AM

Michael Horrell

@horrellmt.bsky.social

Just merged changepoints into the forecasting sub-module of GBNet and released V0.4.0.

Default forecast performance improved by 20% and achieved a 5X speedup. Using the random training, random horizon benchmark, now 9 of 9 example datasets have better performance with GBNet compared to Prophet.

April 12, 2025 at 8:11 PM

Michael Horrell

@horrellmt.bsky.social

Still working on changepoints. Several methods work but don't improve benchmarks. When your model is Trend + XGBoost, XGB can just handle a lot of non-stationarity.

Most promising method so far (see plot) asks GBDT to fit and find the changepoints. Another cool application of GBNet (see equation).

March 30, 2025 at 10:35 PM

Michael Horrell

@horrellmt.bsky.social

Claude gets half credit. It added changepoints for the PyTorch trend option but skipped it for the GBLinear option.

Unfortunately it's back to the drawing board. PyTorch changepoints fit too slowly and GBLinear, I now realize, can't actually turn them off. It does work though!

March 19, 2025 at 11:33 PM

Michael Horrell

@horrellmt.bsky.social

Next, I'll add changepoints to the Forecasting model for GBNet. With this feature, Prophet's primary prediction functionality will be available in GBNet.

I wonder how much the LLMs will be able to one-shot it. According to the Prophet paper it's just a broken stick regression with lasso penalty.

March 9, 2025 at 11:38 PM

Michael Horrell

@horrellmt.bsky.social

Just released v0.3.0 of GBNet.

New features:
- ReadTheDocs website
- GBLinear
- GBLinear integration into forecasting model (10X faster, 10% improvement in predictions)
- Python class refactoring

March 9, 2025 at 11:25 PM

Michael Horrell

@horrellmt.bsky.social

Case in point: GBNet Forecast fits XB + GBDT(X). It has a linear component.

I replaced PyTorch Linear with GBLinear (removing batchnorm) and...
1. improved accuracy (see table)
2. sped up fitting 10X by using fewer training rounds
3. improved worst-case performance (see plot)

March 5, 2025 at 3:34 AM

Michael Horrell

@horrellmt.bsky.social

The newest addition to GBNet is GBLinear, a linear layer. GBLinear provides the same functionality as PyTorch Linear, BUT the bias and weights are updated via boosting using gradients and hessians.

Why do this?

March 2, 2025 at 9:34 PM

Michael Horrell

@horrellmt.bsky.social

Cool trick I hadn't seen before today -- You can solve Ridge Regression with some simple concatenations to the X and Y inputs.

Turned out great for my use-case because I was using a pure least-squares solver but I wanted Ridge Regression.

February 21, 2025 at 3:48 AM

Michael Horrell

@horrellmt.bsky.social

I ran across "Harmonic Cross-Entropy Loss" this week: github.com/KindXiaoming...

Nice to see a deep learning project that is easy to just jump into and start using.

February 7, 2025 at 2:09 AM

Michael Horrell

@horrellmt.bsky.social

Due to a user request, I added Ordinal Regression into GBNet. You can now use XGBoost/LightGBM for Ord. Regression.

Ord. loss is complex and has viable alternatives. But, on a set of 19 Ordinal Datasets, Ord. Reg. using LightGBM came out on top. Maybe worth keeping in mind if your data is Ordinal.

Table of win rates comparing results from multiple methods on 19 datasets.

January 26, 2025 at 11:10 PM

Michael Horrell

@horrellmt.bsky.social

For the latest release of GBNet, I added the LightGBM backend for the forecasting model. Now you can fit Linear Trend + LightGBM with a couple lines of code.

Performance-wise, the LightGBM backend wins overall.

Over 500 trials:
LightGBM wins 236 (47%)
XGBoost wins 153 (31%)
Prophet wins 111 (22%)

January 17, 2025 at 9:20 PM

Michael Horrell

@horrellmt.bsky.social

The right is the embedding that, when multiplied by Beta, gives classification logits.

To intuit this, consider the direction defined by the 2D columns of Beta (a 2 x 10 matrix) as the specific classification directions.

Michael Horrell @horrellmt.bsky.social · Dec 18

I used GBNet to fit different 2D embeddings of MNIST. One is trained to classify via GBNet(X) * Beta. The other is trained by contrastive learning via || GBNet(X) - GBNet(Y) ||.

Plots are below. Which is the contrastive embedding? Which is the classification embedding?

December 28, 2024 at 11:02 AM

Michael Horrell

@horrellmt.bsky.social

I used GBNet to fit different 2D embeddings of MNIST. One is trained to classify via GBNet(X) * Beta. The other is trained by contrastive learning via || GBNet(X) - GBNet(Y) ||.

Plots are below. Which is the contrastive embedding? Which is the classification embedding?

December 18, 2024 at 12:42 AM

Michael Horrell

@horrellmt.bsky.social

POV: me flying from 80 degree Miami to -25 wind chill Chicago earlier this week

December 14, 2024 at 10:29 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news