Lightnews — Scholar-powered news

Nicholas Lourie

@nicholaslourie.bsky.social

Better empirical methods for deep learning & NLP. PhD at NYU. Advised by He He and @kyunghyuncho.bsky.social. Prev: @ai2.bsky.social.

I build things. 🤖

Posts Replies Media Videos

Nicholas Lourie

@nicholaslourie.bsky.social

And for anyone at #COLM2025, if you're curious come chat at our poster! We're presenting as Poster 67 at Poster Session 4 this afternoon!

October 8, 2025 at 8:18 PM

Nicholas Lourie

@nicholaslourie.bsky.social

Deep learning is an empirical science, its progress depends on empirical tools.

We hope these tools help you make progress in your research! You can get them with just a `pip install opda`.

paper: arxiv.org/abs/2510.027...
code: github.com/nicholaslour...
docs: nicholaslourie.github.io/opda/

🧵9/9

Hyperparameter Loss Surfaces Are Simple Near their Optima

Hyperparameters greatly impact models' capabilities; however, modern models are too large for extensive search. Instead, researchers design recipes that train well across scales based on their underst...

arxiv.org

October 8, 2025 at 8:14 PM

Nicholas Lourie

@nicholaslourie.bsky.social

The noisy quadratic emerges across a range of architectures, tasks, and modalities—including language modeling, supervised finetuning, and imagenet pretraining.

In all these scenarios, our theory displays an excellent fit! 👇

See the paper for even more!

🧵8/9

October 8, 2025 at 8:14 PM

Nicholas Lourie

@nicholaslourie.bsky.social

The noisy quadratic distribution (Q) has 4 parameters, corresponding to properties of the loss surface like the *best possible performance* or the *effective number of hyperparameters*. Using the noisy quadratic, you can construct confidence intervals for these quantities.

🧵7/9

October 8, 2025 at 8:14 PM

Nicholas Lourie

@nicholaslourie.bsky.social

The score distribution's tail converges to new distribution: the noisy quadratic.

If you find where the noisy quadratic matches the score distribution, then you've found where the simple structure starts, or (as we call it) the *asymptotic regime*.

🧵6/9

October 8, 2025 at 8:14 PM

Nicholas Lourie

@nicholaslourie.bsky.social

But, how do you find the region where that simple structure holds? With a familiar tool: random search!

When you sample hyperparameters and evaluate them you get a validation score. That process defines the *score distribution* from random search, and we prove a novel limit theorem about it.

🧵5/9

October 8, 2025 at 8:14 PM

Nicholas Lourie

@nicholaslourie.bsky.social

The problem is: the validation score isn't a deterministic function of the hyperparameters. If you retrain the same model, you'll get two different scores!

Luckily, the noise is simple: normally distributed with constant variance. You see this empirically if you retrain a model many times. 👇

🧵4/9

October 8, 2025 at 8:14 PM

Nicholas Lourie

@nicholaslourie.bsky.social

You get quadratic structure from a Taylor expansion about the optimum. As search progresses, the hyperparameters you care about get closer to the optimum and the Taylor expansion becomes a better approximation.

🧵3/9

October 8, 2025 at 8:14 PM

Nicholas Lourie

@nicholaslourie.bsky.social

Hyperparameters are complex, but near the optimum—near the hyperparameters that matter most—their structure becomes surprisingly simple: *quadratic* with *additive normal noise*.

🧵2/9

October 8, 2025 at 8:14 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news