Lightnews — Scholar-powered news

Thomas Fel

@thomasfel.bsky.social

The Bau lab is on fire ! 😍

The secret life of an LM is defined by its internal data types. Inner layers transport abstractions that are more robust than words, like concepts, functions, or pointers.

In new work yesterday, @arnabsensharma.bsky.social et al identify a data type for *predicates*.

bsky.app/profile/arn...

Arnab Sen Sharma (@arnabsensharma.bsky.social)

How can a language model find the veggies in a menu? New pre-print where we investigate the internal mechanisms of LLMs when filtering on a list of options. Spoiler: turns out LLMs use strategies surprisingly similar to functional programming (think "filter" from python)! 🧵

bsky.app

November 6, 2025 at 2:13 PM

Reposted by Thomas Fel

Jennifer Hu

@jennhu.bsky.social

Interested in doing a PhD at the intersection of human and machine cognition? ✨ I'm recruiting students for Fall 2026! ✨

Topics of interest include pragmatics, metacognition, reasoning, & interpretability (in humans and AI).

Check out JHU's mentoring program (due 11/15) for help with your SoP 👇

JHU Cognitive Science @jhucogsci.bsky.social · 11d

The department of Cognitive Science @jhu.edu is seeking motivated students interested in joining our interdisciplinary PhD program! Applications due 1 Dec

Our PhD students also run an application mentoring program for prospective students. Mentoring requests due November 15.

tinyurl.com/2nrn4jf9

Call for applications to cognitive science PhD program with QR code to the link above

November 4, 2025 at 2:44 PM

Reposted by Thomas Fel

Ida Momennejad

@neuroai.bsky.social

Pleased to share new work with @sflippl.bsky.social @eberleoliver.bsky.social @thomasmcgee.bsky.social & undergrad interns at Institute for Pure and Applied Mathematics, UCLA.

Algorithmic Primitives and Compositional Geometry of Reasoning in Language Models
www.arxiv.org/pdf/2510.15987

🧵1/n

October 27, 2025 at 6:13 PM

Reposted by Thomas Fel

Thomas Serre

@thomasserre.bsky.social

🧠 Thrilled to share our NeuroView with Ellie Pavlick!

"From Prediction to Understanding: Will AI Foundation Models Transform Brain Science?"

AI foundation models are coming to neuroscience—if scaling laws hold, predictive power will be unprecedented.

But is that enough?

Thread 🧵👇

Neuron @cp-neuron.bsky.social · 19d

From prediction to understanding: Will AI foundation models transform brain science?

Deep-learning approaches using massive data have transformed AI and are reshaping science. We ask when AI foundation models will transform neuroscience, outlining critical success conditions and a shift from prediction to explanation—linking computations to mechanisms of neural activity and cognition.

dlvr.it

October 24, 2025 at 11:22 AM

Reposted by Thomas Fel

Naomi Saphra

@nsaphra.bsky.social

This is so cool. When you look at representational geometry, it seems intuitive that models are combining convex regions of "concepts", but I wouldn't have expected that this is PROVABLY true for attention or that there was such a rich theory for this kind of geometry.

Thomas Fel @thomasfel.bsky.social · 26d

🕳️🐇Into the Rabbit Hull – Part II

Continuing our interpretation of DINOv2, the second part of our study concerns the *geometry of concepts* and the synthesis of our findings toward a new representational *phenomenology*:

the Minkowski Representation Hypothesis

October 16, 2025 at 6:33 PM

Thomas Fel

@thomasfel.bsky.social

🕳️🐇Into the Rabbit Hull – Part II

Continuing our interpretation of DINOv2, the second part of our study concerns the *geometry of concepts* and the synthesis of our findings toward a new representational *phenomenology*:

the Minkowski Representation Hypothesis

October 15, 2025 at 5:17 PM

Thomas Fel

@thomasfel.bsky.social

🕳️🐇 𝙄𝙣𝙩𝙤 𝙩𝙝𝙚 𝙍𝙖𝙗𝙗𝙞𝙩 𝙃𝙪𝙡𝙡 – 𝙋𝙖𝙧𝙩 𝙄 (𝑃𝑎𝑟𝑡 𝐼𝐼 𝑡𝑜𝑚𝑜𝑟𝑟𝑜𝑤)

𝗔𝗻 𝗶𝗻𝘁𝗲𝗿𝗽𝗿𝗲𝘁𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗱𝗲𝗲𝗽 𝗱𝗶𝘃𝗲 𝗶𝗻𝘁𝗼 𝗗𝗜𝗡𝗢𝘃𝟮, one of vision’s most important foundation models.

And today is Part I, buckle up, we're exploring some of its most charming features. :)

October 14, 2025 at 9:00 PM

Reposted by Thomas Fel

Meenakshi Khosla

@meenakshikhosla.bsky.social

Superposition has reshaped interpretability research. In our @unireps.bsky.social paper led by @andre-longon.bsky.social we show it also matters for measuring alignment! Two systems can represent the same features yet appear misaligned if those features are mixed differently across neurons.

Superposition disentanglement of neural representations reveals hidden alignment

The superposition hypothesis states that a single neuron within a population may participate in the representation of multiple features in order for the population to represent more features than the ...

arxiv.org

October 8, 2025 at 8:54 PM

Reposted by Thomas Fel

Jessica Hullman

@jessicahullman.bsky.social

For XAI it’s often thought explanations help (boundedly rational) user “unlock” info in features for some decision. But no one says this, they say vaguer things like “supporting trust”. We lay out some implicit assumptions that become clearer when you take a formal view here arxiv.org/abs/2506.22740

Explanations are a means to an end

Modern methods for explainable machine learning are designed to describe how models map inputs to outputs--without deep consideration of how these explanations will be used in practice. This paper arg...

arxiv.org

October 8, 2025 at 11:12 PM

Reposted by Thomas Fel

David Picard

@davidpicard.bsky.social

🚨Updated: "How far can we go with ImageNet for Text-to-Image generation?"

TL;DR: train a text2image model from scratch on ImageNet only and beat SDXL.

Paper, code, data available! Reproducible science FTW!
🧵👇

📜 arxiv.org/abs/2502.21318
💻 github.com/lucasdegeorg...
💽 huggingface.co/arijitghosh/...

October 8, 2025 at 8:43 PM

Reposted by Thomas Fel

Greta Tuckute

@gretatuckute.bsky.social

Check out @mryskina.bsky.social's talk and poster at COLM on Tuesday—we present a method to identify 'semantically consistent' brain regions (responding to concepts across modalities) and show that more semantically consistent brain regions are better predicted by LLMs.

Maria Ryskina @mryskina.bsky.social · Oct 4

Interested in language models, brains, and concepts? Check out our COLM 2025 🔦 Spotlight paper!

(And if you’re at COLM, come hear about it on Tuesday – sessions Spotlight 2 & Poster 2)!

Paper title: Language models align with brain regions that represent concepts across modalities.
Authors: Maria Ryskina, Greta Tuckute, Alexander Fung, Ashley Malkin, Evelina Fedorenko.
Affiliations: Maria is affiliated with the Vector Institute for AI, but the work was done at MIT. All other authors are affiliated with MIT.
Email address: maria.ryskina@vectorinstitute.ai.

October 4, 2025 at 12:43 PM

Reposted by Thomas Fel

Deniz Bayazit

@bayazitdeniz.bsky.social

1/🚨 New preprint

How do #LLMs’ inner features change as they train? Using #crosscoders + a new causal metric, we map when features appear, strengthen, or fade across checkpoints—opening a new lens on training dynamics beyond loss curves & benchmarks.

#interpretability

September 25, 2025 at 2:02 PM

Reposted by Thomas Fel

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

Employing mechanistic interpretability to study how models learn, not just where they end up
2 papers find:
There are phase transitions where features emerge and stay throughout learning
🤖📈🧠
alphaxiv.org/pdf/2509.17196
@amuuueller.bsky.social @abosselut.bsky.social
alphaxiv.org/abs/2509.05291

September 26, 2025 at 3:27 PM

Reposted by Thomas Fel

Tiago Pimentel

@tpimentel.bsky.social

Mechanistic interpretability often relies on *interventions* to study how DNNs work. Are these interventions enough to guarantee the features we find are not spurious? No!⚠️ In our new paper, we show many mech int methods implicitly rely on the linear representation hypothesis🧵

Paper title "The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?" with the paper's graphical abstract showing how more powerful alignment maps between a DNN and an algorithm allow more complex features to be found and more "accurate" abstractions.

July 14, 2025 at 12:15 PM

Reposted by Thomas Fel

Sam Gershman

@gershbrain.bsky.social

I was part of an interesting panel discussion yesterday at an ARC event. Maybe everybody knows this already, but I was quite surprised by how "general" intelligence was conceptualized in relation to human intelligence and the ARC benchmarks.

September 28, 2025 at 10:06 AM

Thomas Fel

@thomasfel.bsky.social

Phenomenology → principle → method.

From observed phenomena in representations (conditional orthogonality) we derive a natural instantiation.

And it turns out to be an old friend: Matching Pursuit!

📄 arxiv.org/abs/2506.03093

See you in San Diego,
@neuripsconf.bsky.social
🎉

#interpretability

From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit

Motivated by the hypothesis that neural network representations encode abstract, interpretable features as linearly accessible, approximately orthogonal directions, sparse autoencoders (SAEs) have bec...

arxiv.org

September 28, 2025 at 2:01 PM

Reposted by Thomas Fel

Malcolm Campbell

@malcolmgcampbell.bsky.social

🚨Our preprint is online!🚨

www.biorxiv.org/content/10.1...

How do #dopamine neurons perform the key calculations in reinforcement #learning?

Read on to find out more! 🧵

September 19, 2025 at 1:05 PM

Reposted by Thomas Fel

Isabel Papadimitriou

@isabelpapad.bsky.social

Are there conceptual directions in VLMs that transcend modality? Check out our COLM oral spotlight 🔦 paper! We use SAEs to analyze the multimodality of linear concepts in VLMs

with @chloesu07.bsky.social, @thomasfel.bsky.social, @shamkakade.bsky.social and Stephanie Gil
arxiv.org/abs/2504.11695

September 17, 2025 at 7:12 PM

Thomas Fel

@thomasfel.bsky.social

Check out our COLM 2025 (oral) 🎤

SAEs reveal that VLM embedding spaces aren’t just "image vs. text" cones.
They contain stable conceptual directions, some forming surprising bridges across modalities.

arxiv.org/abs/2504.11695
Demo 👉 vlm-concept-visualization.com

September 17, 2025 at 7:42 PM

Reposted by Thomas Fel

Jennifer Hu

@jennhu.bsky.social

Excited to announce the first workshop on CogInterp: Interpreting Cognition in Deep Learning Models @ NeurIPS 2025! 📣

How can we interpret the algorithms and representations underlying complex behavior in deep learning models?

🌐 coginterp.github.io/neurips2025/

1/4

Home

First Workshop on Interpreting Cognition in Deep Learning Models (NeurIPS 2025)

coginterp.github.io

July 16, 2025 at 1:08 PM

Reposted by Thomas Fel

Andrew Lampinen

@lampinen.bsky.social

How do language models generalize from information they learn in-context vs. via finetuning? In arxiv.org/abs/2505.00661 we show that in-context learning can generalize more flexibly, illustrating key differences in the inductive biases of these modes of learning — and ways to improve finetuning. 1/

arxiv.org

May 2, 2025 at 5:02 PM

Reposted by Thomas Fel

Harry Thasarathan

@hthasarathan.bsky.social

Our work finding universal concepts in vision models is accepted at #ICML2025!!!

My first major conference paper with my wonderful collaborators and friends @matthewkowal.bsky.social @thomasfel.bsky.social
@Julian_Forsyth
@csprofkgd.bsky.social

Working with y'all is the best 🥹

Preprint ⬇️!!

Harry Thasarathan @hthasarathan.bsky.social · Feb 7

🌌🛰️🔭Wanna know which features are universal vs unique in your models and how to find them? Excited to share our preprint: "Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment"!

arxiv.org/abs/2502.03714

(1/9)

May 1, 2025 at 10:57 PM

Reposted by Thomas Fel

Kosta Derpanis

@csprofkgd.bsky.social

Accepted at #ICML2025! Check out the preprint.

HUGE shoutout to Harry (1st PhD paper, in 1st year), Julian (1st ever, done as an undergrad), Thomas and Matt!

@hthasarathan.bsky.social @thomasfel.bsky.social @matthewkowal.bsky.social

Harry Thasarathan @hthasarathan.bsky.social · Feb 7

🌌🛰️🔭Wanna know which features are universal vs unique in your models and how to find them? Excited to share our preprint: "Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment"!

arxiv.org/abs/2502.03714

(1/9)

May 1, 2025 at 3:03 PM

Reposted by Thomas Fel

Gilles Louppe

@glouppe.bsky.social

<proud advisor>
Hot off the arXiv! 🦬 "Appa: Bending Weather Dynamics with Latent Diffusion Models for Global Data Assimilation" 🌍 Appa is our novel 1.5B-parameter probabilistic weather model that unifies reanalysis, filtering, and forecasting in a single framework. A thread 🧵

April 29, 2025 at 4:48 AM

Reposted by Thomas Fel

Arno Solin

@arnosolin.bsky.social

Have you thought that in computer memory model weights are given in terms of discrete values in any case. Thus, why not do probabilistic inference on the discrete (quantized) parameters. @trappmartin.bsky.social is presenting our work at #AABI2025 today. [1/3]

April 29, 2025 at 6:58 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news