Lightnews — Scholar-powered news

Sushrut Thorat

@sushrutthorat.bsky.social

neon-color-spreading. Hard to be sure although there's more positive signal in the center BUT the class prediction stays "wire"...

November 18, 2025 at 9:48 PM

Sushrut Thorat

@sushrutthorat.bsky.social

I did a quick check with the BLT_VS trained on Ecoset (github.com/KietzmannLab...).

Visualizing the feedback at the second "LGN" layer and printing the predicted class. The feedback doesn't seem to show the illusory contour but the class, interestingly changes from guitar to lamp to axe??

November 18, 2025 at 9:42 PM

Sushrut Thorat

@sushrutthorat.bsky.social

esp. because the "confidence increase due to feedback" picture painted here - bsky.app/profile/tahe... - is eerily similar to what we expected (and found) in the BLTs in arxiv.org/abs/2111.07898

Very curious!

November 18, 2025 at 8:26 PM

Sushrut Thorat

@sushrutthorat.bsky.social

GPN alignment beyond VVC (NSD streams ROIs): better than input embeddings/GSNs in all ROIs; better than SOTA in parietal/midparietal; on par with SOTA in midventral/midlateral/lateral; worse only in early visual cortex (expected due to low-level features). 11/14

Figure S6: Alignment of GPNs with SimCLR backbone and comparison networks across the streams ROIs (see Figure 2B for reference). Per ROI, the alignment of the GSNs and the best-aligned comparison network for that ROI are shown. GPN-R-SimCLR only performs worse than SOTA in "early" i.e. EVC (due to the absence of low-level features).

November 18, 2025 at 12:37 PM

Sushrut Thorat

@sushrutthorat.bsky.social

GPN-R-SimCLR isn't just a SOTA model of ventral scene representations, but it also largely subsumes the variance explained by all the other models (variance partitioning; green edged-squares: unique variance)! Universality? Language-based codes <= co-occurrence of visual scene parts? 10/14

Variance partitioning results; Figure 2B: Moreover, other models explained little to no unique variance over that explained by GPN-R-SimCLR (right axis, square markers).

November 18, 2025 at 12:37 PM

Sushrut Thorat

@sushrutthorat.bsky.social

Equating the architecture and dataset, but switching objective from glimpse prediction to caption embedding (MPNet; sGSN) or multi-class object prediction (cGSN), reduces the alignment. Furthermore, no related/SOTA model (36 tested; Table S5) outperforms GPN-R-SimCLR => a new SOTA model! 9/14

GPN-R-SimCLR outperforms all control/comparison models i.e., a new SOTA; Figure 2B: GPN-R-SimCLR alignment was higher than Glimpse Stitching Networks (GSNs) trained with the same architecture but different objectives: to predict the classes of objects in the scenes (cGSN) or semantic embeddings of scene captions (sGSN). A large number of state-of-the-art models had lower VVC-alignment than the GPNs-B/R-SimCLR (highest alignment across those models, for EfficientNet-B3, is indicated by the dashed red line).

November 18, 2025 at 12:37 PM

Sushrut Thorat

@sushrutthorat.bsky.social

We assess RDM alignment with the 'ventral' visual cortex (VVC) RDMs, across GPN variants (and glimpse embedding backbones). GPN representations align better than the input glimpse embeddings (dotted black lines) => GPN contextualization & integration creates VVC-aligned scene repr.! 8/14

GPN alignment vs glimpse embedding alignment; Figure 2B: GPNs had higher alignment than their input glimpse embeddings (dotted black lines). Amongst the GPNs, GPN-R-SimCLR had the highest VVC-alignment. Overall, recurrence helped but saccades hurt the alignment.

November 18, 2025 at 12:37 PM

Sushrut Thorat

@sushrutthorat.bsky.social

Glimpse embeddings are contextualized (wrt their relations) and integrated, resulting in a "scene representation". Do GPN repr. align with natural scene repr. in human visual cortex? We turn to the Natural Scenes Dataset (NSD) and Representational Similarity Analysis (RSA). 7/14

NSD/RSA setup; Figure 2A from the paper: Representational dissimilarity matrices (RDMs) were computed from scene representations in VVC and the networks. Alignment was quantified as the variance of VVC RDM explained by a model RDM via non-negative linear regression.

November 18, 2025 at 12:37 PM

Sushrut Thorat

@sushrutthorat.bsky.social

GPN predictions align with embedding of the next-glimpse (given saccade) > other glimpses from the same scene > glimpses from other scenes => co-occurrence (+ spatial arrangement, with S) learning. With R, prediction loss decreases over glimpses => integration. 6/14

Figure 1. Glimpse Prediction Networks (GPNs) and their behavior. (B) On held-out COCO test scenes, the GPN-RS-SimCLR next-glimpse embedding predictions aligned best with embeddings of co-occurring glimpses adhering to the spatial arrangement signaled by the saccade (e.g. feet of the elephant are predicted to be bottom-right of its face). (C, left) Across all COCO test scenes, given the central glimpse (#0), the GPN predictions had higher cosines to embeddings of other glimpses from the same scene than to embeddings of glimpses from other scenes (which were orthogonalized; not expected given the Input), signaling co-occurrence learning. The provision of saccades made the predictions better aligned with next-glimpse embeddings than to embeddings of other glimpses from the same scene, signaling spatial arrangement learning. (C, right) The prediction loss decreased with increasing exposure to glimpses for GPNs with access to recurrence, signaling integration of information across glimpses. ‘Input’ refers to the condition where an identity transformation was assumed to reveal cosine similarities expected given the input glimpse embeddings. Error bars show 95% CIs (although here they are miniscule).

November 18, 2025 at 12:37 PM

Sushrut Thorat

@sushrutthorat.bsky.social

Glimpse Prediction Networks (GPNs) take high-level glimpse embedding & optionally, planned saccade S, as inputs, and predict the high-level next-glimpse embedding, optionally using recurrence (R) carrying state across glimpses. Glimpses = COCO scene crops around DeepGaze3 fixations. 5/14

GPN architecture; Figure 1A from the paper: GPNs took glimpse embeddings, and predicted the next glimpse embeddings in a sequence, with optional access to saccades (S) and recurrence (R). They were trained with a contrastive prediction loss.

November 18, 2025 at 12:37 PM

Sushrut Thorat

@sushrutthorat.bsky.social

A hard ARC problem from Fig. 1 of www.nature.com/articles/s41...

Am I the only one who thinks in the test solution, the “overtaken” dots could be red?

October 29, 2025 at 8:49 AM

Sushrut Thorat

@sushrutthorat.bsky.social

Thanks for engaging :)
In Geirhos's cc images, the texture doesn't have to only be high-freq. The gram matrices are aligned across all layers - in later layers the RF sizes are huge so the correlations needn't necessarily only reflect small-scale variation, as seen in my post.

October 7, 2025 at 3:40 PM

Sushrut Thorat

@sushrutthorat.bsky.social

hmm,
1. the way they quantify "texture" is based solely on high-freq components. but, there are low-freq components which do not signal meaningful information about shape either and could influence classification (suppl fig. from upcoming rev. of arxiv.org/abs/2507.03168)

October 7, 2025 at 11:55 AM

Sushrut Thorat

@sushrutthorat.bsky.social

I was wondering if Alexnet is not the same in Geirhos and our checks compared to yours. Indeed, both the channel# and RF sizes have been changed. Specifically, larger RFs might definitely help with shape bias. Indeed, when the original RFs are used, the shape bias seems to drop to ~0.4 7/

July 10, 2025 at 7:51 AM

Sushrut Thorat

@sushrutthorat.bsky.social

Won't this inflate the reported shape and texture accuracies, and change the shape bias, as compared to say what Geirhos reports (e.g. in proceedings.neurips.cc/paper_files/...)

Alexnet seems to barely have a shape bias of ~0.3, whereas your Fig. 2 suggest a shape bias of 0.5! 6/

July 10, 2025 at 7:51 AM

Sushrut Thorat

@sushrutthorat.bsky.social

So why are our results different?

I looked into the way shape bias was computed in your paper. I have a few questions:

"We selected the class with the highest probability from this subset and mapped it to one of the corresponding 16 categories." -> so the accuracy was not computed 1000-way? 3/

July 10, 2025 at 7:51 AM

Sushrut Thorat

@sushrutthorat.bsky.social

As you might've seen, we too recently found that devo considerations massively help with shape bias. However, more than visual acuity or color, contrast sensitivity was found to be key - bsky.app/profile/sush... In fact, color+blur doesn't get us above 0.5! 2/

July 10, 2025 at 7:51 AM

Sushrut Thorat

@sushrutthorat.bsky.social

If this early experience is so critical for humans to acquire an entire, useful, ability (or bias), might it be useful for computer vision systems? It just so happens that current neural networks lack this bias—when shown cue-conflict images, their inference is texture-based. Can we help? 2/7

July 8, 2025 at 3:09 PM

Sushrut Thorat

@sushrutthorat.bsky.social

I don't think one would think much of the blurry, underdeveloped vision that babies have. But apparently, if you miss a few months of vision after birth, you acquire configural processing deficits (you cannot readily distinguish faces based on relative positioning of the nose, lips, and eyes)! 1/7

July 8, 2025 at 3:09 PM

Sushrut Thorat

@sushrutthorat.bsky.social

Woah this is insane! @tessamdekker.bsky.social this might be of interest!

June 29, 2025 at 3:33 PM

Sushrut Thorat

@sushrutthorat.bsky.social

Was Neel’s response to these two tweets.

June 4, 2025 at 7:34 PM

Sushrut Thorat

@sushrutthorat.bsky.social

100% on Jupyter notebooks, esp. ones which can just work with Google Colab! I'm not yet sold on notebook-pubs or even on the Anthropic web releases. It's so much fun reading a hard copy detailing the core messages of the work, then go play with the data/model if interested.

June 4, 2025 at 4:36 PM

Sushrut Thorat

@sushrutthorat.bsky.social

This is a great example for the utility of preprints.
"and it wasn’t just peer reviewed, it was peer tested." - RELEASE YOUR DATA & CODE!!!

June 4, 2025 at 4:32 PM

Sushrut Thorat

@sushrutthorat.bsky.social

... (Claude/Gemini do the same)

January 16, 2025 at 10:49 AM

Sushrut Thorat

@sushrutthorat.bsky.social

This is an interesting paper from CCN this year. Curious to see it fleshed out - 2024.ccneuro.org/pdf/595_Pape...

"Euclidean coordinates are the wrong prior for models of primate vision"

December 2, 2024 at 8:06 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news