Lightnews — Scholar-powered news

Andrew Lampinen

@lampinen.bsky.social

Apologies for being quiet on here lately — been focusing on the more important things in life :)

Wedding photo of us entering our reception

November 9, 2025 at 11:34 PM

Andrew Lampinen

@lampinen.bsky.social

We show that even when models generalize well from parametric learning in standard (nontrivial) evaluations, there are selective, consistent failures of latent learning. Only models with retrieval generalize well on the key tests of latent learning. 6/

The benefits of oracle retrieval on the (a) Codebooks and (b) simple reversals benchmarks. Both baseline and retrieval models perform well on component tasks like recalling definitions, or encoding new sequences involving indices used in encoding during training (a, center). However, performance differs dramatically on the latent encoding test (right bars on both plots), where only the model with retrieval achieves above-chance performance.

September 22, 2025 at 4:21 AM

Andrew Lampinen

@lampinen.bsky.social

To illustrate this point, we explore latent learning across a wide range of benchmarks (from codebook translation to BC and RL navigation) — and compare baseline language models or agents to those equipped with oracle retrieval. 5/

The benchmarks we use and the key types of latent generalization that they test. (a) The codebooks benchmark tests the ability to use latent indices (highlighted in red) for which only the definitions have been seen in training to complete test encoding sequences. (b) The simple reversals benchmark tests the ability of models to reverse relations seen in training, and which models have learned to reverse in-context. (c) The semantic structure benchmark uses training embedded in more naturalistic text to test latent generalization types ranging from reversals to syllogisms, or more challenging category-inclusion-only holdouts. (d) The latent gridworld—with both its pixel-based RL and ASCII-based BC instantiations—tests the ability to navigate to objects that have never been a navigation goal in training for a particular maze, but have been frequently seen.

September 22, 2025 at 4:21 AM

Andrew Lampinen

@lampinen.bsky.social

But models can readily use latent information in their context. We therefore suggest that natural intelligence solves the latent learning problem via the complementary strengths of episodic memory: reinstating experiences into context makes latent information accessible. 4/

Explicit retrieval of learning experiences from nonparametric learning systems complements the broader knowledge of parametric learning—by making select, relevant experiences available in context where they can be more flexibly used in ways different from the original task setting in which they were encountered.

September 22, 2025 at 4:21 AM

Andrew Lampinen

@lampinen.bsky.social

we argue that parametric learning methods are too tied to the explicit training task, and fail to effectively encode latent information relevant to possible future tasks, and we suggest that this explains a wide range of findings, from navigation to the reversal curse. 3/

While a model may be trained on some explicit information (e.g. X is Y's teacher" or goals (e.g. navigate to Z), there may be other information latent in it (such as a reversal "Y is X's teacher).
Challenges of reversal are one instance of the much broader phenomenon that what is explicitly learned may also latently convey information relevant to other tasks—e.g., multi-hop reasoning,
alternative goals, or answering questions in other languages. Like the reversal curse, learning on such sequences may primarily improve performance on the explicit information or goals; however, if the sequence were in context, models would readily be able to make inferences about the latent information.

September 22, 2025 at 4:21 AM

Andrew Lampinen

@lampinen.bsky.social

When we've compared these in past work e.g. Supplement fig. A.13 here proceedings.neurips.cc/paper/2020/h... we've seen pretty similar results between the two. Haven't run it in exactly this setting though. There are also some arguments that 1/2

August 5, 2025 at 8:18 PM

Andrew Lampinen

@lampinen.bsky.social

I don't know of any reviews unfortunately! Fig. 16 in our TMLR paper (openreview.net/forum?id=aY2...) shows an instance — training classifiers on the penultimate reps to decode a label predicted by both easy and hard features; at high predictivity the classifier prefers the easy feature, even 1/2

August 5, 2025 at 6:28 PM

Andrew Lampinen

@lampinen.bsky.social

We also present a worst-case study I find conceptually interesting: homomorphic encryption. It’s possible to do systematic computation over representations whose content is always encrypted, and thus difficult to decode by design!

Homomorphic encryption: strongly dissociating computation from patterns of representation
In the experiments described above, the role that the representations played in the computations of
the system was relatively straightforward, even where the representations were biased. However, this does not have to be the case. We illustrate this with a final case
study of the possibility for strong dissociation between computation and patterns of representation:
homomorphic encryption (Gentry, 2009; Van Dijk et al., 2010). While the field of cryptography is
largely focused on creating representations that preserve information yet are not easily decodable,
in homomorphic encryption schemes it is additionally possible to perform arbitrary computations
(any algebraic circuit) over this information while it is encrypted. That is, at each step of such a computation, a new encrypted representation is produced that corresponds to the result of encrypting
the representation at that step of the original computation.
This example shows that it is not necessary for a computational system to have any straightforward (e.g. linearly decodable) representation of the features that it uses in its computations. Systematic
computations can be performed even over representations that are deliberately crafted to thwart
attempts to understand (decrypt) their content.

As a special case, this also illustrates that systematic compositional computations are
possible without requiring representations that are straightforwardly compositional. Encrypted
representations are compositional only in the sense that “with the right highly-nonlinear decoding
scheme compositional representations can be extracted”—which is also true of some coding schemes
typically interpreted as non-compositional, such as idiosyncratic representations of each input. This
raises questions about if and when it is feasible to rigorously confirm whether a system’s computations
are compositional from representational analyses.

August 5, 2025 at 2:36 PM

Andrew Lampinen

@lampinen.bsky.social

We briefly discuss (some of) the origins of these biases — they are driven by both learning dynamics and the fact that there are in some sense a larger variety of “natural” ways to represent a nonlinear feature.

Why are representations biased towards easier features? The biases are driven by multiple factors, including learning dynamics and the different ways that nonlinear features can be represented. (Left) By manipulating training order (training the hard task first rather than both simultaneously), the magnitude of the biases can be reduced. (Right) Likewise, by accounting for the fact that there can be more ways to represent a nonlinear feature that are not linearly equivalent—for example, different ways of drawing intermediate classification boundaries to compute an XOR function—we can identify other components of the representations that may be contributing to the model’s computation of the hard feature. Together the learning dynamics and multiple ways of representing features explain most of the representation bias towards the easy feature over the hard.

August 5, 2025 at 2:36 PM

Andrew Lampinen

@lampinen.bsky.social

These biases can lead to dramatic downstream effects that cause unexpected conclusions from analyses. For example, RSA may identify two models computing the same, complex task as much less representationally-similar than either of them is to a model computing a much simpler task (right panel)!

RSA within and between different sets of models can give surprising results due to representation biases. This plot shows similarities within and between different models computing different types of features. Ideally the similarities would be highest in blocks on the diagonal (i.e. models computing the same features), and the blocks off the diagonal would show graded similarity corresponding to the functional overlap. However, that is not the case. (Left) When comparing a model trained to output both easy and hard features to ones that are trained on only one feature, the multi-task model appears very similar to the easy-task only model (cf. Hermann and Lampinen, 2020). In fact, the models trained only on the hard task do not even appear particularly similar to other models trained on the same exact task. (Right) When models are trained on multiple easy or multiple hard tasks, the models trained on only hard tasks appear less similar to other models trained on exactly the same tasks than they do to models trained on strictly easier tasks that use the same input units.

August 5, 2025 at 2:36 PM

Andrew Lampinen

@lampinen.bsky.social

Representations were systematically biased towards certain kinds of features. For example, a model reliably computing easy (linear) and hard (nonlinear) features has 55% repr. variance explained by the easy one, 5% by the hard, with similar biases in top PCs and individual units.

Representational biases: in representations of a model computing easy (linear) and hard (4-parity) features, the overall variance explained in the last layer representations by the easy feature is over 55%, while the variance explained by the hard feature is around 5%. This is reflected in the top PCs clearly clustering by the easy feature but not reflecting the hard, and these biases are also present the unit level (almost all units, especially the most active ones), represent the easy feature more strongly.

August 5, 2025 at 2:36 PM

Andrew Lampinen

@lampinen.bsky.social

We constructed controlled datasets with many input features, and trained deep learning models to compute functions of those features (e.g. linear ones like identifying a feature, or nonlinear ones like XOR). We then analyzed the patterns of representational activity they learned.

Datasets where many input features (color, shape, texture, … size) are fed in to creating input data. Linear or nonlinear classification tasks can be created, e.g. classifying whether an object is a circle (linear) or whether it is XOR(yellow, checkered) which is nonlinear.
Experiments: training neural networks to output multiple features computed from an input, e.g. a linear and nonlinear one.
Learned representations: stimuli presented and datasets of representational activity from the model, as might be collected in a neuroscience experiment.

August 5, 2025 at 2:36 PM

Andrew Lampinen

@lampinen.bsky.social

In neuroscience, we often try to understand systems by analyzing their representations — using tools like regression or RSA. But are these analyses biased towards discovering a subset of what a system represents? If you're interested in this question, check out our new commentary! Thread:

What do representations tell us about a system? Image of a mouse with a scope showing a vector of activity patterns, and a neural network with a vector of unit activity patterns
Common analyses of neural representations: Encoding models (relating activity to task features) drawing of an arrow from a trace saying [on_____on____] to a neuron and spike train. Comparing models via neural predictivity: comparing two neural networks by their R^2 to mouse brain activity. RSA: assessing brain-brain or model-brain correspondence using representational dissimilarity matrices

August 5, 2025 at 2:36 PM

Andrew Lampinen

@lampinen.bsky.social

Looking forward to attending CogSci this week! I'll be giving a talk (see below) at the Reasoning Across Minds and Machines workshop on Wednesday at 10:25 AM, and will be around most of the week — feel free to reach out if you'd like to meet up!

July 28, 2025 at 6:07 PM

Andrew Lampinen

@lampinen.bsky.social

We argued this was necessary even for formal domains like mathematics, which are fundamentally about the insight behind the logic — as mathematicians have long pointed out.

Box 2 | Syntax & meaning in mathematics
Newell & Simon motivated the physical symbol system hypothesis by this claim: “Logic, and by
incorporation all of mathematics, was a game played with meaningless tokens according to certain
purely syntactic rules. All meaning had been purged.” [1]. In the reasoning systems they created,
symbols thus became synonymous with discrete entities, symbolic processing entailed manipulating
with formal algebras, and the properties of our formal symbolic systems were inherited by GOFAI.
But, is this how humans perform logical reasoning? Fields medalist Paul Cohen responded directly: “I
never was able to successfully analyze proofs as a combinatorial ‘game’ played with symbols on paper,”
instead, to reason productively “one must essentially forget that all proofs are eventually transcribed
in this formal language” [77]. Mac Lane [78] explains why: “strict formalism can’t explain which of
many formulas matter [...] the choice of form is determined by ideas and experience.” The meaning
of symbols and their relationships gives mathematicians intuitions for why a theorem might be true,
and thereby ideas of how to prove it (as do examples; 79). Furthermore, a major goal of theories and
proofs is to convey understanding. Thus, mathematicians are often unsatisfied with a formal proof of
a theorem that does not clearly communicate understanding of the underlying ideas [36, 80]. The
meaning behind symbols is the essence of mathematics, not a burden to be discarded.
McClelland et al. [65] draw on similar arguments to suggest that neural network models are uniquely
well-posed to capture the meaning-laden processes of human mathematical reasoning

July 21, 2025 at 10:20 PM

Andrew Lampinen

@lampinen.bsky.social

Augmented finetuning (green bars above) substantially outperforms other methods. We then test a larger dataset of thousands of documents generated from an underlying semantic structure, and still see strong benefits from ICL and ICL-augmented finetuning (below). 5/

Across four dataset categories, ICL outperforms finetuning, and augmented finetuning outperforms both, though gaps are generally smaller than in the simpler experiments. Interestingly, finetuning performs above chance in some conditions like reversals. Category splits remain difficult for all methods

May 2, 2025 at 5:02 PM

Andrew Lampinen

@lampinen.bsky.social

But it’s not just reversals; ICL consistently generalizes better than finetuning in areas like syllogistic deduction too. Motivated by these findings, we propose a method to improve finetuning generalization: prompt language models to augment the data using ICL! 4/

Syllogisms: finetuned models perform at 35% (somewhat better than chance), ICL performs around 60%, augmented finetuning (see below) performs around 83%

May 2, 2025 at 5:02 PM

Andrew Lampinen

@lampinen.bsky.social

Across various datasets, we find that ICL generalizes much better to certain kinds of tests than finetuning. For example, in the setting of the Reversal Curse dataset, just putting the whole dataset in context achieves almost 100% generalization! 3/

Reversal curse: pretrained models perform at chance on the data, finetuned models perform at 0% (worse than chance), ICL performs around 97%, augmented finetuning (see below) performs slightly better.

May 2, 2025 at 5:02 PM

Andrew Lampinen

@lampinen.bsky.social

We use controlled experiments to explore the generalization of ICL and finetuning in data-matched settings; if we have some documents containing new knowledge, does the LM generalize better from finetuning on them, or just putting all of them in context? 2/

Overview: If we have a language model and a new dataset, we can incorporate it in two ways: either finetuning the model on the dataset, or just putting the dataset in the context of the original model. Which generalizes better to held out test questions?

May 2, 2025 at 5:02 PM

Andrew Lampinen

@lampinen.bsky.social

I trained models with different spurious predictivity in training. If the spurious feature is absent, the model fails to generalize. But it turns out there's a "sweet spot" where the model learns the spurious feature, then uses it to learn the harder one, and thus generalizes!

Plot of train and test accuracy vs. spurious feature predictivity in training. Train accuracy is always 100%. Test accuracy is 50% if the spurious feature is unpredictive, or 100% predictive. But there is a inverted-u-shaped benefit where starting from around 50% training predictivity of the spurious feature, and peaking around 75% training predictivity, the model achieves near-perfect generalization to the test set, even though the spurious feature is always unpredictive on the test set.

May 1, 2025 at 12:32 AM

Andrew Lampinen

@lampinen.bsky.social

So I made a simpler analog: a toy dataset where a transformer is trained to predict a hard relational feature (whether two cued tokens match; like syntactic agreement) and a simpler spurious feature (identities of the cued tokens). In test, the spurious feature is always useless.

Could spurious features improve transformer generalization? Train: sentenes have syntax (hard, but 100% predictive of the label) and content (easy but 75% predictive). But content gives a hint about the label.
Test: under distribution shift; syntax is still 100% predictive of the label, but content is useless.

May 1, 2025 at 12:32 AM

Andrew Lampinen

@lampinen.bsky.social

This is loosely inspired by the idea that, in a complex sentence with multiple noun-verb relations, a system could use either the syntax (a difficult, relational feature), or the semantics (easy, but spurious and unreliable) to figure out which nouns go with which verbs.

Two ways of processing the sentence "The dog that the people on the corner own barks at the car.”
1) a syntactic tree (a hard feature)
2) the content: It’s usually dogs that bark at vehicles. It’s usually people that own animals.
(2) gives a hint about (1).

May 1, 2025 at 12:32 AM

Andrew Lampinen

@lampinen.bsky.social

and deriving reductive explanations from our models using tools like studying how data properties alter the behaviors & mechanisms models learn, as a kind of rational analysis, and studying model computations; these can build towards formal, normative theories.

Figure 7: Overview of how we can develop theories with potentially opaque models and unnatural manipulations of natural data.

February 28, 2025 at 5:14 PM

Andrew Lampinen

@lampinen.bsky.social

Finally, we turn to how complex naturalistic experimental paradigms and opaque models can lead to conceptual understanding, through experiments that augment the multidimensional variation of natural stimuli with theory-driven parametric manipulations…

Two ways of deconfounding a natural distribution: use a simplified artificial one (narrow), or systematically manipulate the broader distribution to deconfound it (while maintaining variability).

February 28, 2025 at 5:14 PM

Andrew Lampinen

@lampinen.bsky.social

We then move to practicalities of building generalizable models that scale to naturalistic settings — and draw concrete recommendations from AI, including the benefits of frictionless reproducibility and dynamic, cumulative benchmarks.

Overview of key strategies cognitive science can adopt to develop generalizable models: frictionless reproducibility, seeking external validity, and building dynamic phenomena-oriented benchmarks

February 28, 2025 at 5:14 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news