Lightnews — Scholar-powered news

micha heilbron

@mheilbron.bsky.social

780 followers 330 following 66 posts

Assistant Professor of Cognitive AI @UvA Amsterdam
language and vision in brains & machines
cognitive science 🤝 AI 🤝 cognitive neuroscience
michaheilbron.github.io

Posts Replies Media Videos

micha heilbron

@mheilbron.bsky.social

archive.ph/smEj0 (or, unpaywalled 🤫)

archive.ph

November 7, 2025 at 10:32 AM

micha heilbron

@mheilbron.bsky.social

omg. what journal? name and shame

September 19, 2025 at 12:34 PM

micha heilbron

@mheilbron.bsky.social

huh! if these effects are similar and consistent, I think it should work, but the q. is how do you get a vector representation for novel pseudowords? we currently use lexicosemantic word vectors and they are undefined for novel words.

so how to represent the novel words? v. interesting test case

September 19, 2025 at 12:32 PM

micha heilbron

@mheilbron.bsky.social

@nicolecrust.bsky.social might be of interest

September 18, 2025 at 11:52 AM

micha heilbron

@mheilbron.bsky.social

Together, our results support a classic idea: cognitive limitations can be a powerful inductive bias for learning

Yet they also reveal a curious distinction: a model with more human-like *constraints* is not necessarily more human-like in its predictions

August 18, 2025 at 12:40 PM

micha heilbron

@mheilbron.bsky.social

This paradox – better language models yielding worse behavioural predictions – could not be explained by prior explanations: The mechanism appears distinct from those linked to superhuman training scale or memorisation

August 18, 2025 at 12:40 PM

micha heilbron

@mheilbron.bsky.social

However, we then used these models to predict human behaviour

Strikingly these same models that were demonstrably better at the language task, were worse at predicting human reading behaviour

August 18, 2025 at 12:40 PM

micha heilbron

@mheilbron.bsky.social

The benefit was robust

Fleeting memory models achieved better next-token prediction (lower loss) and better syntactic knowledge (higher accuracy) on the BLiMP benchmark

This was consistent across seeds and for both 10M and 100M training sets

August 18, 2025 at 12:40 PM

micha heilbron

@mheilbron.bsky.social

But we noticed this naive decay was too strong

Human memory has a brief 'echoic' buffer that perfectly preserves the immediate past. When we added this – a short window of perfect retention before the decay -- the pattern flipped

Now, fleeting memory *helped* (lower loss)

August 18, 2025 at 12:40 PM

micha heilbron

@mheilbron.bsky.social

Our first attempt, a "naive" memory decay starting from the most recent word, actually *impaired* language learning. Models with this decay had higher validation loss, and this worsened (even higher loss) as the decay became stronger

August 18, 2025 at 12:40 PM

micha heilbron

@mheilbron.bsky.social

To test this in a modern context, we propose the ‘fleeting memory transformer’

We applied a power-law memory decay to the self-attention scores, simulating how access to past words fades over time, and ran controlled experiments on the developmentally realistic BabyLM corpus

August 18, 2025 at 12:40 PM

micha heilbron

@mheilbron.bsky.social

However, this appears difficult to reconcile with the success of transformers, which can learn language very effectively, despite lacking working memory limitations or other recency biases

Would the blessing of fleeting memory still hold in transformer language models?

August 18, 2025 at 12:40 PM

micha heilbron

@mheilbron.bsky.social

A core idea in cognitive science is that the fleetingness of working memory isn't a flaw

It may actually help at learning language by forcing a focus on the recent past and providing an incentive to discover abstract structure rather than surface details

August 18, 2025 at 12:40 PM

micha heilbron

@mheilbron.bsky.social

On Wednesday, Maithe van Noort will present a poster on “Compositional Meaning in Vision-Language Models and the Brain”

First results from a much larger project on visual and linguistic meaning in brains and machines, with many collaborators -- more to come!  
t.ly/TWsyT

Poster Presentation

t.ly

August 12, 2025 at 11:14 AM

micha heilbron

@mheilbron.bsky.social

On Friday, during a contributed talk (and a poster), @wiegerscheurer will present the project he spearheaded: “A hierarchy of spatial predictions across human visual cortex during natural vision”   (Full preprint soon)

t.ly/fTJqy

Poster Presentation

t.ly

August 12, 2025 at 11:14 AM

micha heilbron

@mheilbron.bsky.social

i’m all in the “this is a neat way to help explain things” camp fwiw :)

May 23, 2025 at 3:53 PM

micha heilbron

@mheilbron.bsky.social

Our findings, together with some other recent studies, suggest the brain may use a similar strategy — constantly predicting higher-level features — to efficiently learn robust visual representations of (and from!) the natural world

May 23, 2025 at 11:39 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news