Lightnews — Scholar-powered news

David Holzmüller

@dholzmueller.bsky.social

I got 3rd out of 691 in a tabular kaggle competition – with only neural networks! 🥉

My solution is short (48 LOC) and relatively general-purpose – I used skrub to preprocess string and date columns, and pytabkit to create an ensemble of RealMLP and TabM models. Link below👇

July 29, 2025 at 11:10 AM

David Holzmüller

@dholzmueller.bsky.social

🚨ICLR poster in 1.5 hours, presented by @danielmusekamp.bsky.social :
Can active learning help to generate better datasets for neural PDE solvers?
We introduce a new benchmark to find out!
Featuring 6 PDEs, 6 AL methods, 3 architectures and many ablations - transferability, speed, etc.!

April 24, 2025 at 12:38 AM

David Holzmüller

@dholzmueller.bsky.social

A new tabular classification benchmark provides another independent evaluation of our RealMLP. RealMLP is the best classical DL model, although some other recent baselines are missing. TabPFN is better on small datasets and boosted trees on larger datasets, though.

March 4, 2025 at 2:25 PM

David Holzmüller

@dholzmueller.bsky.social

In case anyone is wondering about the name RealMLP, it is motivated by the “Real MVP” meme (which probably also inspired the RealNVP method). 6/6

January 16, 2025 at 12:05 PM

David Holzmüller

@dholzmueller.bsky.social

When including more baselines, RealMLP’s average rank slightly improves to make it the top-performing method overall, with a fifth place on binary classification, first place on multi-class, and second place on regression. 3/

January 16, 2025 at 12:05 PM

David Holzmüller

@dholzmueller.bsky.social

The first independent evaluation of our RealMLP is here!

On a recent 300-dataset benchmark with many baselines, RealMLP takes a shared first place overall. 🔥

Importantly, RealMLP is also relatively CPU-friendly, unlike other SOTA DL models (including TabPFNv2 and TabM). 🧵 1/

Plots from the benchmark of Ye et al. (2024)

January 16, 2025 at 12:05 PM

David Holzmüller

@dholzmueller.bsky.social

We also have results for LightGBM with our tuned default hyperparameters (LGBM-TD), but they are somewhat similar and the behavior might depend on the “subsample” hyperparameter (which is related to bagging). 4/

November 29, 2024 at 11:38 AM

David Holzmüller

@dholzmueller.bsky.social

The result? Refitting is a bit better, but only if you fit an ensemble during refitting.
But: it’s slower, you don’t get validation scores for the refitted models, the result might change with more folds, and tuning the hyperparameters on the CV scores may favor bagging. 2/

November 29, 2024 at 11:38 AM

David Holzmüller

@dholzmueller.bsky.social

A reason for the different sensitivities may also be that val metrics that are more similar to the train loss are more likely to decrease monotonically, and therefore have less risk of stopping too early.

For regression with MSE we found little sensitivity to the patience. 3/

November 27, 2024 at 10:27 AM

David Holzmüller

@dholzmueller.bsky.social

For early stopping on boosted trees, using accuracy as the val metric requires high patience.
Brier loss yields similar test accuracy for high patience but is less sensitive to patience.
Cross-entropy (the train metric) is even less sensitive but not as good for test accuracy. 2/

November 27, 2024 at 10:27 AM

David Holzmüller

@dholzmueller.bsky.social

Finally, there are some limitations, partially due to the cost of running all of the benchmarks. 14/

November 18, 2024 at 2:15 PM

David Holzmüller

@dholzmueller.bsky.social

For training, we use AdamW with a multi-cycle learning rate schedule. Since it makes early stopping more difficult, we always train for the full 256 epochs and revert to the best epoch afterwards. Unfortunately, this makes RealMLP quite a bit slower on average. 13/

November 18, 2024 at 2:15 PM

David Holzmüller

@dholzmueller.bsky.social

For classification, using label smoothing in the cross-entropy loss improves the results for classification error, but hurts other metrics like AUROC (see below) or cross-entropy itself. This discrepancy is inconvenient, and I hope it can be resolved in future research. 12/

November 18, 2024 at 2:15 PM

David Holzmüller

@dholzmueller.bsky.social

To encourage feature selection, we introduce a diagonal weight layer, which we call scaling layer, after the embedding layer. Luckily, we found out that it is much more effective with a much larger layer-wise learning rate (96x for RealTabR-D). 11/

Scaling layer as elementwise multiplication

November 18, 2024 at 2:15 PM

David Holzmüller

@dholzmueller.bsky.social

Architecturally, we modify numerical embedding layers (arxiv.org/abs/2203.05556) by introducing first-layer biases and a Densenet-style skip connection, which yields good results even at (CPU-friendly) small embedding sizes. 10/

November 18, 2024 at 2:15 PM

David Holzmüller

@dholzmueller.bsky.social

We introduce robust scaling + smooth clipping (RS+SC), an outlier-robust preprocessing method combining quantile-based rescaling and soft clipping to (-3, 3).
It is more robust than a StandardScaler but preserves more distributional information than a QuantileTransformer. 9/

November 18, 2024 at 2:15 PM

David Holzmüller

@dholzmueller.bsky.social

So, what is new in RealMLP?
We add many things in different areas like architecture, preprocessing, training/hyperparameters, regularization, and initialization.
We provide multiple ablations, and I want to highlight some of the new things below. 8/

November 18, 2024 at 2:15 PM

David Holzmüller

@dholzmueller.bsky.social

Depending on the benchmark and metrics/aggregation, RealMLP is sometimes a bit better than boosted trees and sometimes a bit worse.
Generally, taking the best TD model (Best-TD) on each dataset typically has a better time-accuracy trade-off than 50 steps of random search HPO . 7/

November 18, 2024 at 2:15 PM

David Holzmüller

@dholzmueller.bsky.social

For boosted trees, our tuned defaults (TD) outperform the library defaults (D) in our standard metrics, though they do not match hyperparameter optimization (HPO) on meta-test, and the results are more mixed on other metrics/benchmarks. 6/

November 18, 2024 at 2:15 PM

David Holzmüller

@dholzmueller.bsky.social

To test if our “bag of tricks” transfers to other architectures, we try some of the tricks on the retrieval-based TabR-S-D, with much less tuning than for RealMLP-TD.
The resulting RealTabR-D performs much better than the default parameters from the original paper. 5/

November 18, 2024 at 2:15 PM

David Holzmüller

@dholzmueller.bsky.social

RealMLP can be used with tuned defaults (TD) or hyperparameter optimization (HPO).
We tuned defaults and our “bag of tricks” only on meta-train. Still, RealMLP outperforms the MLP-PLR baseline with numerical embeddings on all benchmarks. 4/

November 18, 2024 at 2:15 PM

David Holzmüller

@dholzmueller.bsky.social

Our new methods and default parameters are tuned on a meta-train benchmark and then also evaluated on
- a disjoint meta-test benchmark including large and high-dimensional datasets
- the smaller Grinsztajn et al. benchmark (with more baselines). 3/

Overview of benchmarks and main metrics.

November 18, 2024 at 2:15 PM

David Holzmüller

@dholzmueller.bsky.social

Coauthors: Léo Grinsztajn (@leogrin.bsky.social) and Ingo Steinwart
Paper: arxiv.org/abs/2407.04491
Code: github.com/dholzmueller...

Our library is pip-installable and contains easy-to-use and configurable scikit-learn interfaces (including baselines). 2/

November 18, 2024 at 2:15 PM

David Holzmüller

@dholzmueller.bsky.social

Can deep learning finally compete with boosted trees on tabular data? 🌲
In our NeurIPS 2024 paper, we introduce RealMLP, a NN with improvements in all areas and meta-learned default parameters.
Some insights about RealMLP and other models on large benchmarks (>200 datasets): 🧵

Paper screenshot and Figure 1 (c) with cumulative ablations for components of RealMLP-TD.

November 18, 2024 at 2:15 PM

David Holzmüller

@dholzmueller.bsky.social

The video and slides of my talk are online (link is in the quoted tweet). 📽️ https://twitter.com/DHolzmueller/status/1831385303405281347

November 19, 2024 at 11:35 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news