Huge thanks to co-authors, and especially to Prof. Kamitani ( @ykamit.bsky.social), for their invaluable support throughout this work!
Huge thanks to co-authors, and especially to Prof. Kamitani ( @ykamit.bsky.social), for their invaluable support throughout this work!
If this thread sparked your interest, please take a look at our paper!
If this thread sparked your interest, please take a look at our paper!
Visualization itself also has value, but it’s crucial to recognize the huge gap between visualization and reconstruction.
Visualization itself also has value, but it’s crucial to recognize the huge gap between visualization and reconstruction.
This deviates fundamentally from genuine visual reconstruction, which aims to recover arbitrary visual experiences.
This deviates fundamentally from genuine visual reconstruction, which aims to recover arbitrary visual experiences.
We fed it true image features instead of predicted ones.
The outputs were semantically similar—but perceptually quite different.
It seems the Generator relies mainly on semantic features, with less focus on perceptual fidelity.
We fed it true image features instead of predicted ones.
The outputs were semantically similar—but perceptually quite different.
It seems the Generator relies mainly on semantic features, with less focus on perceptual fidelity.
Careful identification analyses revealed a fundamental limitation in generalizing beyond the training distribution.
Translator, though a regressor, behaves more like a classifier.
Careful identification analyses revealed a fundamental limitation in generalizing beyond the training distribution.
Translator, though a regressor, behaves more like a classifier.
- distinct clusters (~40)
- substantial overlap between training and test sets
NSD test images were also perceptually similar to training images (B), unlike in carefully curated Deeprecon (C).
- distinct clusters (~40)
- substantial overlap between training and test sets
NSD test images were also perceptually similar to training images (B), unlike in carefully curated Deeprecon (C).
The Translator maps brain activity to the Latent features, and the Generator converts those features into images.
We analyzed each component in detail.
The Translator maps brain activity to the Latent features, and the Generator converts those features into images.
We analyzed each component in detail.
They worked well on NSD (A), but performance severely dropped on Deeprecon (B).
The latest MindEye2 even generated training-set categories unrelated to test stimuli.
So what’s behind this generalization failure?
They worked well on NSD (A), but performance severely dropped on Deeprecon (B).
The latest MindEye2 even generated training-set categories unrelated to test stimuli.
So what’s behind this generalization failure?
Prior works (e.g., Miyawaki+ 2008, Shen+ 2019) pursued this goal.
Recent studies report realistic reconstructions from NSD using CLIP + diffusion models.
But—do they truly achieve this goal?
Prior works (e.g., Miyawaki+ 2008, Shen+ 2019) pursued this goal.
Recent studies report realistic reconstructions from NSD using CLIP + diffusion models.
But—do they truly achieve this goal?