Lightnews — Scholar-powered news

micha heilbron

@mheilbron.bsky.social

780 followers 330 following 66 posts

Assistant Professor of Cognitive AI @UvA Amsterdam
language and vision in brains & machines
cognitive science 🤝 AI 🤝 cognitive neuroscience
michaheilbron.github.io

Posts Replies Media Videos

micha heilbron

@mheilbron.bsky.social

However, we then used these models to predict human behaviour

Strikingly these same models that were demonstrably better at the language task, were worse at predicting human reading behaviour

August 18, 2025 at 12:40 PM

micha heilbron

@mheilbron.bsky.social

The benefit was robust

Fleeting memory models achieved better next-token prediction (lower loss) and better syntactic knowledge (higher accuracy) on the BLiMP benchmark

This was consistent across seeds and for both 10M and 100M training sets

August 18, 2025 at 12:40 PM

micha heilbron

@mheilbron.bsky.social

But we noticed this naive decay was too strong

Human memory has a brief 'echoic' buffer that perfectly preserves the immediate past. When we added this – a short window of perfect retention before the decay -- the pattern flipped

Now, fleeting memory *helped* (lower loss)

August 18, 2025 at 12:40 PM

micha heilbron

@mheilbron.bsky.social

Our first attempt, a "naive" memory decay starting from the most recent word, actually *impaired* language learning. Models with this decay had higher validation loss, and this worsened (even higher loss) as the decay became stronger

August 18, 2025 at 12:40 PM

micha heilbron

@mheilbron.bsky.social

To test this in a modern context, we propose the ‘fleeting memory transformer’

We applied a power-law memory decay to the self-attention scores, simulating how access to past words fades over time, and ran controlled experiments on the developmentally realistic BabyLM corpus

August 18, 2025 at 12:40 PM

micha heilbron

@mheilbron.bsky.social

CCN has arrived here here in Amsterdam!

Come find me to meet or catch up

Some highlights from students and collaborators:

August 12, 2025 at 11:14 AM

micha heilbron

@mheilbron.bsky.social

Remarkably, these prediction effects appeared independent of recent experience with the specific images presented

This suggests they rely on long-term, ingrained priors about the statistical structure of the visual world, rather than on recent exposure to these specific images

May 23, 2025 at 11:39 AM

micha heilbron

@mheilbron.bsky.social

In V1, this sensitivity to unpredictability – presumably a neural marker of prediction error – was stronger in superficial cortical layers

This aligns with hierarchical predictive coding models that postulate that prediction error are computed in superficial layers

May 23, 2025 at 11:39 AM

micha heilbron

@mheilbron.bsky.social

Like @martinavinck.bsky.social et al, we found a striking dissociation:

Neurons were most sensitive to the *predictability* of high-level visual features (red line), even in areas like V1, most sensitive to low-level visual *features* (blue line)

& this dissociation was found across visual cortex

May 23, 2025 at 11:39 AM

micha heilbron

@mheilbron.bsky.social

To understand *what* the visual system is predicting, we quantified unpredictability at multiple levels of abstraction (using CNN layers as a proxy)

This can dissociate predictability of low-level features (e.g. lines/edges) versus higher-level features (e.g., textures, objects)

May 23, 2025 at 11:39 AM

micha heilbron

@mheilbron.bsky.social

Across visual cortex, we found clear predictability effects: less predictable image patches evoked stronger neural responses

This aligns with core predictive processing ideas (prediction error) and the established literature using controlled designs (expectation suppression)

May 23, 2025 at 11:39 AM

micha heilbron

@mheilbron.bsky.social

Building on @martinavinck.bsky.social et al., we used deep generative models to quantify how predictable receptive field (RF) patches in images are from their surroundings

We then related these scores to spiking from Allen Institute Neuropixels, controlling for low-level features

May 23, 2025 at 11:39 AM

micha heilbron

@mheilbron.bsky.social

April 23, 2025 at 11:27 AM

micha heilbron

@mheilbron.bsky.social

en route to Scotland, visiting universities of Edinburgh and Glasgow tomorrow & Friday.

do reach out if you are around and want to talk language models, brains, or anything in between!

March 19, 2025 at 5:15 PM

micha heilbron

@mheilbron.bsky.social

lovely and slightly surreal to be back at Queen Square after all those years, but had a blast talking about generative AI and the predictive brain at the FIL today @peterkok.bsky.social @clarepress.bsky.social

March 19, 2025 at 3:12 PM

micha heilbron

@mheilbron.bsky.social

If we group variables into “low-level” vs “cognitive”, we find a striking dissociation: for skipping, low-level variables can account for much more unique variation than lexical processing-based explanations, whereas for reading times, it is exactly the other way around

October 2, 2023 at 1:02 PM

micha heilbron

@mheilbron.bsky.social

Using set theory, we can then precisely quantify how much overlapping and shared variation is accounted for by each type of explanation

October 2, 2023 at 1:02 PM

micha heilbron

@mheilbron.bsky.social

We investigated this in three large corpora of natural reading (>1M words).

For each word in the text, we model how much information (in bits) was available at the previous fixation location, from both prediction — p(word | context) — and preview — P(word | preview)

October 2, 2023 at 1:01 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news