Lightnews — Scholar-powered news

Sushrut Thorat

@sushrutthorat.bsky.social

💪 yup these are fun results!

November 19, 2025 at 9:05 AM

Sushrut Thorat

@sushrutthorat.bsky.social

Your focus on illusions as a pathway forward is cool! Will keep thinking about it 😇

November 19, 2025 at 6:08 AM

Sushrut Thorat

@sushrutthorat.bsky.social

It could be that these effects aren't as strong because Imagenet itself does not have enough instances of occlusion, etc. compared to arxiv.org/abs/2111.07898 , to install strong segmentation priors into these RNNs (ofc they are all quite texture-biased and reliant on the backgrounds).

Category-orthogonal object features guide information processing in recurrent neural networks trained for object categorization

Recurrent neural networks (RNNs) have been shown to perform better than feedforward architectures in visual object categorization tasks, especially in challenging conditions such as cluttered images. ...

arxiv.org

November 18, 2025 at 9:53 PM

Sushrut Thorat

@sushrutthorat.bsky.social

neon-color-spreading. Hard to be sure although there's more positive signal in the center BUT the class prediction stays "wire"...

November 18, 2025 at 9:48 PM

Sushrut Thorat

@sushrutthorat.bsky.social

I have a feeling the reason we do not see an illusory contour per se is because the feedforward drive is super strong currently. Still, the predictions are changing in accordance with what you wanted (although what's up with axe?). I can try and pull up closest neighbors later if i get time.

November 18, 2025 at 9:42 PM

Sushrut Thorat

@sushrutthorat.bsky.social

I did a quick check with the BLT_VS trained on Ecoset (github.com/KietzmannLab...).

Visualizing the feedback at the second "LGN" layer and printing the predicted class. The feedback doesn't seem to show the illusory contour but the class, interestingly changes from guitar to lamp to axe??

November 18, 2025 at 9:42 PM

Sushrut Thorat

@sushrutthorat.bsky.social

esp. because the "confidence increase due to feedback" picture painted here - bsky.app/profile/tahe... - is eerily similar to what we expected (and found) in the BLTs in arxiv.org/abs/2111.07898

Very curious!

November 18, 2025 at 8:26 PM

Sushrut Thorat

@sushrutthorat.bsky.social

Might looking into BLTs (e.g., journals.plos.org/ploscompbiol... , arxiv.org/abs/2308.12435 , github.com/KietzmannLab...) make sense here? We found signatures of decluttering in BLTs - arxiv.org/abs/2111.07898 - owing to feedback/recurrence.

Category-orthogonal object features guide information processing in recurrent neural networks trained for object categorization

Recurrent neural networks (RNNs) have been shown to perform better than feedforward architectures in visual object categorization tasks, especially in challenging conditions such as cluttered images. ...

arxiv.org

November 18, 2025 at 8:25 PM

Sushrut Thorat

@sushrutthorat.bsky.social

I should mention that @summerfieldlab.bsky.social 's www.sciencedirect.com/science/arti... (+ www.cell.com/neuron/fullt... ) and @shahabbakht.bsky.social 's proceedings.neurips.cc/paper_files/... were big inspirations towards building GPNs.

Structure learning and the posterior parietal cortex

We propose a theory of structure learning in the primate brain. We argue that the parietal cortex is critical for learning about relations among the o…

www.sciencedirect.com

November 18, 2025 at 4:50 PM

Sushrut Thorat

@sushrutthorat.bsky.social

GPN-B/R have no efference copies during training— the GPN representations would be invariant to saccades, by definition.

GPN-S/RS have efference copies during training—by definition, their representations won't be invariant; equivariant perhaps close to output as saccade vector signals features.

November 18, 2025 at 4:01 PM

Sushrut Thorat

@sushrutthorat.bsky.social

This framework is different from SimCLR—it is not building de novo representations for glimpses; instead it is learning to map between pre-existing representations (e.g. RN50-SimCLR embeddings), given the saccades.

November 18, 2025 at 3:05 PM

Sushrut Thorat

@sushrutthorat.bsky.social

Actually, in general, I'm finding it hard to think about in-/equi-variance here. Curious what you're thinking.

November 18, 2025 at 2:59 PM

Sushrut Thorat

@sushrutthorat.bsky.social

The objective we employ is inducing something like equivariance—it is trying to pull action/saccade-conditioned glimpses closer (but not overlapping, by definition) to create a co-occurrence aware space. Invariance, hmm I can't easily see how that comes into play.

November 18, 2025 at 2:46 PM

Sushrut Thorat

@sushrutthorat.bsky.social

Code to reproduce the results can be found at: github.com/KietzmannLab...

GitHub - KietzmannLab/GPN: Official implementation and analysis of the Glimpse Prediction Networks.

Official implementation and analysis of the Glimpse Prediction Networks. - KietzmannLab/GPN

github.com

November 18, 2025 at 12:37 PM

Sushrut Thorat

@sushrutthorat.bsky.social

We are curious to hear what you make of the results and conclusions. If your favorite model of scene representations in visual cortex is not included, please let us know. Note that the best model—GPN-R-SimCLR—only explains half of the shared variance across subjects—we've a long way to go! ✨ 14/14

November 18, 2025 at 12:37 PM

Sushrut Thorat

@sushrutthorat.bsky.social

Open Qs:
1) What's the format of GPN scene repr.? (see Fig. S3 for the case of GPN-RS).
2) How does the human visual cortex acquire its scene repr.? Chicken-egg issue wrt eye movements and the repr. they rely on.
3) Eye movements in video-SSL—will they help? (related to vJEPA) 13/14

November 18, 2025 at 12:37 PM

Sushrut Thorat

@sushrutthorat.bsky.social

In sum, predicting the high-level embeddings of the next glimpse in human-like glimpse sequences, leads to the emergence of scene repr. that are SOTA models of scene repr. in human visual cortex => a self-supervised, language-free, way of getting at human (primate?) scene representations! 12/14

November 18, 2025 at 12:37 PM

Sushrut Thorat

@sushrutthorat.bsky.social

GPN alignment beyond VVC (NSD streams ROIs): better than input embeddings/GSNs in all ROIs; better than SOTA in parietal/midparietal; on par with SOTA in midventral/midlateral/lateral; worse only in early visual cortex (expected due to low-level features). 11/14

Figure S6: Alignment of GPNs with SimCLR backbone and comparison networks across the streams ROIs (see Figure 2B for reference). Per ROI, the alignment of the GSNs and the best-aligned comparison network for that ROI are shown. GPN-R-SimCLR only performs worse than SOTA in "early" i.e. EVC (due to the absence of low-level features).

November 18, 2025 at 12:37 PM

Sushrut Thorat

@sushrutthorat.bsky.social

GPN-R-SimCLR isn't just a SOTA model of ventral scene representations, but it also largely subsumes the variance explained by all the other models (variance partitioning; green edged-squares: unique variance)! Universality? Language-based codes <= co-occurrence of visual scene parts? 10/14

Variance partitioning results; Figure 2B: Moreover, other models explained little to no unique variance over that explained by GPN-R-SimCLR (right axis, square markers).

November 18, 2025 at 12:37 PM

Sushrut Thorat

@sushrutthorat.bsky.social

Equating the architecture and dataset, but switching objective from glimpse prediction to caption embedding (MPNet; sGSN) or multi-class object prediction (cGSN), reduces the alignment. Furthermore, no related/SOTA model (36 tested; Table S5) outperforms GPN-R-SimCLR => a new SOTA model! 9/14

GPN-R-SimCLR outperforms all control/comparison models i.e., a new SOTA; Figure 2B: GPN-R-SimCLR alignment was higher than Glimpse Stitching Networks (GSNs) trained with the same architecture but different objectives: to predict the classes of objects in the scenes (cGSN) or semantic embeddings of scene captions (sGSN). A large number of state-of-the-art models had lower VVC-alignment than the GPNs-B/R-SimCLR (highest alignment across those models, for EfficientNet-B3, is indicated by the dashed red line).

November 18, 2025 at 12:37 PM

Sushrut Thorat

@sushrutthorat.bsky.social

We assess RDM alignment with the 'ventral' visual cortex (VVC) RDMs, across GPN variants (and glimpse embedding backbones). GPN representations align better than the input glimpse embeddings (dotted black lines) => GPN contextualization & integration creates VVC-aligned scene repr.! 8/14

GPN alignment vs glimpse embedding alignment; Figure 2B: GPNs had higher alignment than their input glimpse embeddings (dotted black lines). Amongst the GPNs, GPN-R-SimCLR had the highest VVC-alignment. Overall, recurrence helped but saccades hurt the alignment.

November 18, 2025 at 12:37 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news