Ken Shirakawa
banner
kencan7749.bsky.social
Ken Shirakawa
@kencan7749.bsky.social
Ph.D. candidate in Kyoto university and ATR/ Brain decoding / fMRI / neuroAI / neuroscience
What about the Generator (diffusion model)?
We fed it true image features instead of predicted ones.
The outputs were semantically similar—but perceptually quite different.
It seems the Generator relies mainly on semantic features, with less focus on perceptual fidelity.
June 13, 2025 at 9:21 AM
Given the overlap between training/test sets, can the Translator predict test stimuli effectively?

Careful identification analyses revealed a fundamental limitation in generalizing beyond the training distribution.

Translator, though a regressor, behaves more like a classifier.
June 13, 2025 at 9:21 AM
We first check Latent features. UMAP visualization of NSD’s CLIP features revealed (A):

- distinct clusters (~40)
- substantial overlap between training and test sets

NSD test images were also perceptually similar to training images (B), unlike in carefully curated Deeprecon (C).
June 13, 2025 at 9:20 AM
To better understand what was happening, we decomposed these methods into a Translator–Generator pipeline.

The Translator maps brain activity to the Latent features, and the Generator converts those features into images.

We analyzed each component in detail.
June 13, 2025 at 9:19 AM
We tested whether these methods generalize beyond NSD.
They worked well on NSD (A), but performance severely dropped on Deeprecon (B).
The latest MindEye2 even generated training-set categories unrelated to test stimuli.
So what’s behind this generalization failure?
June 13, 2025 at 9:18 AM