Nicholas Lourie
nicholaslourie.bsky.social
Nicholas Lourie
@nicholaslourie.bsky.social
Better empirical methods for deep learning & NLP. PhD at NYU. Advised by He He and @kyunghyuncho.bsky.social. Prev: @ai2.bsky.social.

I build things. 🤖
And for anyone at #COLM2025, if you're curious come chat at our poster! We're presenting as Poster 67 at Poster Session 4 this afternoon!
October 8, 2025 at 8:18 PM
Deep learning is an empirical science, its progress depends on empirical tools.

We hope these tools help you make progress in your research! You can get them with just a `pip install opda`.

paper: arxiv.org/abs/2510.027...
code: github.com/nicholaslour...
docs: nicholaslourie.github.io/opda/

🧵9/9
Hyperparameter Loss Surfaces Are Simple Near their Optima
Hyperparameters greatly impact models' capabilities; however, modern models are too large for extensive search. Instead, researchers design recipes that train well across scales based on their underst...
arxiv.org
October 8, 2025 at 8:14 PM
The noisy quadratic emerges across a range of architectures, tasks, and modalities—including language modeling, supervised finetuning, and imagenet pretraining.

In all these scenarios, our theory displays an excellent fit! 👇

See the paper for even more!

🧵8/9
October 8, 2025 at 8:14 PM
The noisy quadratic distribution (Q) has 4 parameters, corresponding to properties of the loss surface like the *best possible performance* or the *effective number of hyperparameters*. Using the noisy quadratic, you can construct confidence intervals for these quantities.

🧵7/9
October 8, 2025 at 8:14 PM
The score distribution's tail converges to new distribution: the noisy quadratic.

If you find where the noisy quadratic matches the score distribution, then you've found where the simple structure starts, or (as we call it) the *asymptotic regime*.

🧵6/9
October 8, 2025 at 8:14 PM
But, how do you find the region where that simple structure holds? With a familiar tool: random search!

When you sample hyperparameters and evaluate them you get a validation score. That process defines the *score distribution* from random search, and we prove a novel limit theorem about it.

🧵5/9
October 8, 2025 at 8:14 PM
The problem is: the validation score isn't a deterministic function of the hyperparameters. If you retrain the same model, you'll get two different scores!

Luckily, the noise is simple: normally distributed with constant variance. You see this empirically if you retrain a model many times. 👇

🧵4/9
October 8, 2025 at 8:14 PM
You get quadratic structure from a Taylor expansion about the optimum. As search progresses, the hyperparameters you care about get closer to the optimum and the Taylor expansion becomes a better approximation.

🧵3/9
October 8, 2025 at 8:14 PM
Hyperparameters are complex, but near the optimum—near the hyperparameters that matter most—their structure becomes surprisingly simple: *quadratic* with *additive normal noise*.

🧵2/9
October 8, 2025 at 8:14 PM