Sushrut Thorat
banner
sushrutthorat.bsky.social
Sushrut Thorat
@sushrutthorat.bsky.social
Recurrent computations and lifelong learning.
Postdoc at IKW-UOS@DE with @timkietzmann.bsky.social
Prev. Donders@NL‬, ‪CIMeC@IT‬, IIT-B@IN
💪 yup these are fun results!
November 19, 2025 at 9:05 AM
Your focus on illusions as a pathway forward is cool! Will keep thinking about it 😇
November 19, 2025 at 6:08 AM
It could be that these effects aren't as strong because Imagenet itself does not have enough instances of occlusion, etc. compared to arxiv.org/abs/2111.07898 , to install strong segmentation priors into these RNNs (ofc they are all quite texture-biased and reliant on the backgrounds).
Category-orthogonal object features guide information processing in recurrent neural networks trained for object categorization
Recurrent neural networks (RNNs) have been shown to perform better than feedforward architectures in visual object categorization tasks, especially in challenging conditions such as cluttered images. ...
arxiv.org
November 18, 2025 at 9:53 PM
neon-color-spreading. Hard to be sure although there's more positive signal in the center BUT the class prediction stays "wire"...
November 18, 2025 at 9:48 PM
I have a feeling the reason we do not see an illusory contour per se is because the feedforward drive is super strong currently. Still, the predictions are changing in accordance with what you wanted (although what's up with axe?). I can try and pull up closest neighbors later if i get time.
November 18, 2025 at 9:42 PM
I did a quick check with the BLT_VS trained on Ecoset (github.com/KietzmannLab...).

Visualizing the feedback at the second "LGN" layer and printing the predicted class. The feedback doesn't seem to show the illusory contour but the class, interestingly changes from guitar to lamp to axe??
November 18, 2025 at 9:42 PM
esp. because the "confidence increase due to feedback" picture painted here - bsky.app/profile/tahe... - is eerily similar to what we expected (and found) in the BLTs in arxiv.org/abs/2111.07898

Very curious!
November 18, 2025 at 8:26 PM
GPN-B/R have no efference copies during training— the GPN representations would be invariant to saccades, by definition.

GPN-S/RS have efference copies during training—by definition, their representations won't be invariant; equivariant perhaps close to output as saccade vector signals features.
November 18, 2025 at 4:01 PM
This framework is different from SimCLR—it is not building de novo representations for glimpses; instead it is learning to map between pre-existing representations (e.g. RN50-SimCLR embeddings), given the saccades.
November 18, 2025 at 3:05 PM
Actually, in general, I'm finding it hard to think about in-/equi-variance here. Curious what you're thinking.
November 18, 2025 at 2:59 PM
The objective we employ is inducing something like equivariance—it is trying to pull action/saccade-conditioned glimpses closer (but not overlapping, by definition) to create a co-occurrence aware space. Invariance, hmm I can't easily see how that comes into play.
November 18, 2025 at 2:46 PM
Code to reproduce the results can be found at: github.com/KietzmannLab...
GitHub - KietzmannLab/GPN: Official implementation and analysis of the Glimpse Prediction Networks.
Official implementation and analysis of the Glimpse Prediction Networks. - KietzmannLab/GPN
github.com
November 18, 2025 at 12:37 PM
We are curious to hear what you make of the results and conclusions. If your favorite model of scene representations in visual cortex is not included, please let us know. Note that the best model—GPN-R-SimCLR—only explains half of the shared variance across subjects—we've a long way to go! ✨ 14/14
November 18, 2025 at 12:37 PM
Open Qs:
1) What's the format of GPN scene repr.? (see Fig. S3 for the case of GPN-RS).
2) How does the human visual cortex acquire its scene repr.? Chicken-egg issue wrt eye movements and the repr. they rely on.
3) Eye movements in video-SSL—will they help? (related to vJEPA) 13/14
November 18, 2025 at 12:37 PM
In sum, predicting the high-level embeddings of the next glimpse in human-like glimpse sequences, leads to the emergence of scene repr. that are SOTA models of scene repr. in human visual cortex => a self-supervised, language-free, way of getting at human (primate?) scene representations! 12/14
November 18, 2025 at 12:37 PM
GPN alignment beyond VVC (NSD streams ROIs): better than input embeddings/GSNs in all ROIs; better than SOTA in parietal/midparietal; on par with SOTA in midventral/midlateral/lateral; worse only in early visual cortex (expected due to low-level features). 11/14
November 18, 2025 at 12:37 PM
GPN-R-SimCLR isn't just a SOTA model of ventral scene representations, but it also largely subsumes the variance explained by all the other models (variance partitioning; green edged-squares: unique variance)! Universality? Language-based codes <= co-occurrence of visual scene parts? 10/14
November 18, 2025 at 12:37 PM
Equating the architecture and dataset, but switching objective from glimpse prediction to caption embedding (MPNet; sGSN) or multi-class object prediction (cGSN), reduces the alignment. Furthermore, no related/SOTA model (36 tested; Table S5) outperforms GPN-R-SimCLR => a new SOTA model! 9/14
November 18, 2025 at 12:37 PM
We assess RDM alignment with the 'ventral' visual cortex (VVC) RDMs, across GPN variants (and glimpse embedding backbones). GPN representations align better than the input glimpse embeddings (dotted black lines) => GPN contextualization & integration creates VVC-aligned scene repr.! 8/14
November 18, 2025 at 12:37 PM