Andrew Lampinen
banner
lampinen.bsky.social
Andrew Lampinen
@lampinen.bsky.social
Interested in cognition and artificial intelligence. Research Scientist at Google DeepMind. Previously cognitive science at Stanford. Posts are mine.
lampinen.github.io
I'm not sure I fully understand this point; part of our argument here (as well as in some of our past work: arxiv.org/abs/2505.00661) is that models *can* readily produce the reversals when the information is in context; they just *don't* unless there is some problem to solve or other cue to do so.
On the generalization of language models from in-context learning and finetuning: a controlled study
Large language models exhibit exciting capabilities, yet can show surprisingly narrow generalization from finetuning. E.g. they can fail to generalize to simple reversals of relations they are trained...
arxiv.org
September 23, 2025 at 11:10 PM
Hahaha much appreciated
September 22, 2025 at 9:47 PM
Even comparing my own work in different areas; it's harder to be both timely and as through with LM works, especially with the scale of experiments
September 22, 2025 at 7:46 PM
I was gonna say, I feel attacked by this tweet 😅
September 22, 2025 at 7:44 PM
Check out the paper if you’re interested! arxiv.org/abs/2509.16189
And thanks to my awesome collaborators: @martinengelcke.bsky.social, Effie Li, @arslanchaudhry.bsky.social and James McClelland. 9/9
Latent learning: episodic memory complements parametric learning by enabling flexible reuse of experiences
When do machine learning systems fail to generalize, and what mechanisms could improve their generalization? Here, we draw inspiration from cognitive science to argue that one weakness of machine lear...
arxiv.org
September 22, 2025 at 4:21 AM
We think this work sheds light on why retrieval offers distinct benefits beyond just training models more, and provides a different perspective on why episodic memory and parametric learning are complementary, which we hope will be of interest for both AI and cognitive science 8/
September 22, 2025 at 4:21 AM
In the paper, we explore many more settings & nuances — including RL and BC versions of maze navigation experiments based on the original experiments on latent learning in rats, the effects of associative cues, the importance of within-episode ICL, and ablations. 7/
September 22, 2025 at 4:21 AM
We show that even when models generalize well from parametric learning in standard (nontrivial) evaluations, there are selective, consistent failures of latent learning. Only models with retrieval generalize well on the key tests of latent learning. 6/
September 22, 2025 at 4:21 AM
To illustrate this point, we explore latent learning across a wide range of benchmarks (from codebook translation to BC and RL navigation) — and compare baseline language models or agents to those equipped with oracle retrieval. 5/
September 22, 2025 at 4:21 AM
But models can readily use latent information in their context. We therefore suggest that natural intelligence solves the latent learning problem via the complementary strengths of episodic memory: reinstating experiences into context makes latent information accessible. 4/
September 22, 2025 at 4:21 AM
we argue that parametric learning methods are too tied to the explicit training task, and fail to effectively encode latent information relevant to possible future tasks, and we suggest that this explains a wide range of findings, from navigation to the reversal curse. 3/
September 22, 2025 at 4:21 AM
We take inspiration from classic experiments on latent learning in animals, where the animals learn about information that is not useful at present, but that might be useful later — for example, learning the location of useful resources in passing. By contrast, 2/
September 22, 2025 at 4:21 AM
Thanks! Yes, I'm interested in which constraints most strongly push against this: 1) efficiency of acting (current FHE is slow), 2) efficiency of learning (simplicity bias), 3) maybe relatedly probability of learning a la arxiv.org/abs/1805.08522 or 4) some combination thereof
Deep learning generalizes because the parameter-function map is biased towards simple functions
Deep neural networks (DNNs) generalize remarkably well without explicit regularization even in the strongly over-parametrized regime where classical learning theory would instead predict that they wou...
arxiv.org
August 6, 2025 at 2:59 AM
When we've compared these in past work e.g. Supplement fig. A.13 here proceedings.neurips.cc/paper/2020/h... we've seen pretty similar results between the two. Haven't run it in exactly this setting though. There are also some arguments that 1/2
August 5, 2025 at 8:18 PM
even though both are linearly decodable and equally predictive. Katherine's paper studies some instances more thoroughly in simple settings. My sense though is that the magnitude of these effects are quite a bit smaller than the base bias, so probably not a huge issue if datasets aren't tiny. 2/2
August 5, 2025 at 6:28 PM
I don't know of any reviews unfortunately! Fig. 16 in our TMLR paper (openreview.net/forum?id=aY2...) shows an instance — training classifiers on the penultimate reps to decode a label predicted by both easy and hard features; at high predictivity the classifier prefers the easy feature, even 1/2
August 5, 2025 at 6:28 PM
Thanks, glad you like it!
August 5, 2025 at 5:49 PM
just by dimensionality arguments (input dim 64 << first rep 256) even before training *any* function of the inputs will likely be computable from that rep with a sufficiently complex nonlinear decoder — even features like XOR that the model is *incapable* of computing at the first layer. 2/2
August 5, 2025 at 4:30 PM