Lightnews — Scholar-powered news

Reposted by Gabriele Sarti

Can

@canrager.bsky.social

Humans and LLMs think fast and slow. Do SAEs recover slow concepts in LLMs? Not really.

Our Temporal Feature Analyzer discovers contextual features in LLMs, that detect event boundaries, parse complex grammar, and represent ICL patterns.

November 13, 2025 at 10:32 PM

Gabriele Sarti

@gsarti.com

New promising model for interpretability research just dropped!

Alexander Doria @dorialexander.bsky.social · 4d

Through this release, we aim both to support the emerging ecosystem for pretraining research (NanoGPT, NanoChat), explainability (you can literally look at Monad under a microscope) and the tooling orchestration around frontier models.

November 10, 2025 at 9:09 PM

Gabriele Sarti

@gsarti.com

Check out our awesome live-skeeted panel!

BlackboxNLP @blackboxnlp.bsky.social · 5d

Our panel moderated by @danaarad.bsky.social
"Evaluating Interpretability Methods: Challenges and Future Directions" just started! 🎉 Come to learn more about the MIB benchmark and hear the takes of @michaelwhanna.bsky.social, Michal Golovanevsky, Nicolò Brunello and Mingyang Wang!

November 9, 2025 at 7:18 AM

Gabriele Sarti

@gsarti.com

Follow @blackboxnlp.bsky.social for a live skeeting of the event!

BlackboxNLP @blackboxnlp.bsky.social · 6d

BlackboxNLP is up and running! Here's the topics covered by this year's edition at a glance. Excited to see so many interesting topics, and the growing interest in reasoning!

November 9, 2025 at 2:20 AM

Gabriele Sarti

@gsarti.com

Wrapping up my oral presentations today with our TACL paper "QE4PE: Quality Estimation for Human Post-editing" at the Interpretability morning session #EMNLP2025 (Room A104, 11:45 China time)!

Paper: arxiv.org/abs/2503.03044
Slides/video/poster: underline.io/lecture/1315...

November 7, 2025 at 2:50 AM

Gabriele Sarti

@gsarti.com

Presenting today our work "Unsupervised Word-level Quality Estimation Through the Lens of Annotator (Dis)agreement" at the Machine Translation morning session (Room A301, 11:45 China time). See you there! 🤗

Paper: aclanthology.org/2025.emnlp-m...
Slides/video/poster: underline.io/events/502/s...

Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement

Gabriele Sarti, Vilém Zouhar, Malvina Nissim, Arianna Bisazza. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.

aclanthology.org

November 6, 2025 at 1:19 AM

Reposted by Gabriele Sarti

Arnab Sen Sharma

@arnabsensharma.bsky.social

How can a language model find the veggies in a menu?

New pre-print where we investigate the internal mechanisms of LLMs when filtering on a list of options.

Spoiler: turns out LLMs use strategies surprisingly similar to functional programming (think "filter" from python)! 🧵

November 4, 2025 at 5:48 PM

Reposted by Gabriele Sarti

John David Pressman

@jdp.extropian.net

Language models can correctly answer questions about their previous intentions.
www.anthropic.com/research/int...

Emergent introspective awareness in large language models

Research from Anthropic on the ability of large language models to introspect

www.anthropic.com

October 29, 2025 at 6:21 PM

Reposted by Gabriele Sarti

Tiancheng Hu

@tiancheng.bsky.social

Can AI simulate human behavior? 🧠
The promise is revolutionary for science & policy. But there’s a huge "IF": Do these simulations actually reflect reality?
To find out, we introduce SimBench: The first large-scale benchmark for group-level social simulation. (1/9)

October 28, 2025 at 4:54 PM

Gabriele Sarti

@gsarti.com

Our group @gronlp.bsky.social is coming in strong for #EMNLP2025! See you soon in Suzhou! 👋 🇨🇳

GroNLP @gronlp.bsky.social · 18d

With only a week left for #EMNLP2025, we are happy to announce all the works we 🐮 will present 🥳 - come and say "hi" to our posters and presentations during the Main and the co-located events (*SEM and workshops) See you in Suzhou ✈️

accepted papers at main conference and findings

October 28, 2025 at 7:41 AM

Reposted by Gabriele Sarti

Daniel Scalena

@danielsc4.it

You can easily save up to 65% of compute while improving performance on reasoning tasks 🤯 👀

Meet EAGer: We show that monitoring token-level uncertainty lets LLMs allocate compute dynamically - spending MORE on hard problems, LESS on easy ones.
🧵👇

October 16, 2025 at 12:07 PM

Reposted by Gabriele Sarti

Francesca Padovani

@frap98.bsky.social

𝐃𝐨 𝐲𝐨𝐮 𝐫𝐞𝐚𝐥𝐥𝐲 𝐰𝐚𝐧𝐭 𝐭𝐨 𝐬𝐞𝐞 𝐰𝐡𝐚𝐭 𝐦𝐮𝐥𝐭𝐢𝐥𝐢𝐧𝐠𝐮𝐚𝐥 𝐞𝐟𝐟𝐨𝐫𝐭 𝐥𝐨𝐨𝐤𝐬 𝐥𝐢𝐤𝐞? 🇨🇳🇮🇩🇸🇪

Here’s the proof! 𝐁𝐚𝐛𝐲𝐁𝐚𝐛𝐞𝐥𝐋𝐌 is the first Multilingual Benchmark of Developmentally Plausible Training Data available for 45 languages to the NLP community 🎉

arxiv.org/abs/2510.10159

October 14, 2025 at 5:01 PM

Gabriele Sarti

@gsarti.com

"Assuming linearly encoded concepts"

Marjorie James Keenan @unenthusiast.com · Oct 12

In honour of spooky month, share a 4 word horror story that only someone in your profession would understand.

rm -rf ~/

hammancheez @hammancheez.bsky.social · Oct 12

"The chancellor approved it"

October 12, 2025 at 4:26 PM

Gabriele Sarti

@gsarti.com

Very cool demonstration of how the @ndif-team.bsky.social Workbench allows for quick iteration on different prompt setups!

David Bau @davidbau.bsky.social · Oct 11

How embarrassing for me and confusing to the LLM!

OK, here it is fixed. Nice thing about workbench is that it just takes a second to edit the prompt, and you can see how the LLM responds, now deciding very early it should be ':'

October 11, 2025 at 8:12 PM

Gabriele Sarti

@gsarti.com

Making model internals accessible to domain experts in low-code interfaces will unlock the next step in making interpretability useful across a variety of domains. Very excited about the NDIF Workbench! 💡

NDIF Team @ndif-team.bsky.social · Oct 10

Ever wished you could explore what's happening inside a 405B parameter model without writing any code? Workbench, our AI interpretability interface, is now live for public beta at workbench.ndif.us!

October 10, 2025 at 5:53 PM

Gabriele Sarti

@gsarti.com

I was amazed by how avant-garde this was, but 30min into Greg Egan's Permutation City and already stumbled on digital twins, longevity-crazed billionaires and widespread B2C rentable compute instances, all from 1994! 🤯 Really prescient!

Gabriele Sarti @gsarti.com · Aug 14

TIL Ken Liu predicted an eerily familiar setting featuring OpenAI and sama-like characters + US-China race dynamics in his short story "The Perfect Match" from 2012.

October 4, 2025 at 9:19 AM

Reposted by Gabriele Sarti

Tiago Pimentel

@tpimentel.bsky.social

Mechanistic interpretability often relies on *interventions* to study how DNNs work. Are these interventions enough to guarantee the features we find are not spurious? No!⚠️ In our new paper, we show many mech int methods implicitly rely on the linear representation hypothesis🧵

Paper title "The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?" with the paper's graphical abstract showing how more powerful alignment maps between a DNN and an algorithm allow more complex features to be found and more "accurate" abstractions.

July 14, 2025 at 12:15 PM

Gabriele Sarti

@gsarti.com

What could go wrong when asking Claude to make an Imagine demo within Claude Imagine and using it to play Tic Tac Toe? When notified about the error, the model promptly adds "Sorry about that. Continue playing..." to the interface 😂

October 2, 2025 at 4:15 PM

Reposted by Gabriele Sarti

Naomi Saphra

@nsaphra.bsky.social

really neat clear explainer for the new on “centralizing flows” to theoretically model learning dynamics

Understanding Optimization in Deep Learning with Central Flows

centralflows.github.io

October 1, 2025 at 12:20 PM

Reposted by Gabriele Sarti

Aaron Mueller

@amuuueller.bsky.social

What's the right unit of analysis for understanding LLM internals? We explore in our mech interp survey (a major update from our 2024 ms).

We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!

October 1, 2025 at 2:03 PM

Reposted by Gabriele Sarti

Nils Feldhus

@nfel.bsky.social

🔍 Are you curious about uncovering the underlying mechanisms and identifying the roles of model components (neurons, …) and abstractions (SAEs, …)?

We provide the first survey of concept description generation and evaluation methods.

Joint effort w/ @lkopf.bsky.social

📄 arxiv.org/abs/2510.01048

Overview of descriptions for model components (neurons, attention heads) and model abstractions (SAE features, circuits).

October 2, 2025 at 9:13 AM

Gabriele Sarti

@gsarti.com

Now with sleek flyers to test your skills in Italian crossword solving! 🤗 Join our #EVALITA2026 task!

September 23, 2025 at 7:17 AM

Gabriele Sarti

@gsarti.com

It is again the time of year when I beg @aclmeeting.bsky.social execs to rethink the current streaming platform system. For my #EMNLP2025 submissions, I am *required* to upload 2 video recordings + 2 posters + 2 slide decks. Why force both posters and talks for all? Nonsense.

September 15, 2025 at 3:20 PM

Gabriele Sarti

@gsarti.com

Language puzzles from "La Settimana Enigmistica" keep you up at night? Fear not! 🧩 Our new shared task on automatic crossword solving is now live at #EVALITA2026. Be sure to check it out!

Alessio Miaschi @alessiomiaschi.bsky.social · Sep 15

🚨 Exciting news from #EVALITA2026 (@ailc-nlp.bsky.social)!
I'm co-organizing Cruciverb-IT, the first shared task on crossword solving 🧩✍️ together with Ciaccio C., @gsarti.com, Dell'Orletta F. and @malvinanissim.bsky.social!
If you love cracking crosswords (or cracking models that do), join us! 🎉

September 15, 2025 at 10:27 AM

Reposted by Gabriele Sarti

Yoav Goldberg

@yoavgo.bsky.social

When reading AI reasoning text (aka CoT), we (humans) form a narrative about the underlying computation process, which we take as a transparent explanation of model behavior. But what if our narratives are wrong? We measure that and find it usually is.

Now on arXiv: arxiv.org/abs/2508.16599

Humans Perceive Wrong Narratives from AI Reasoning Texts

A new generation of AI models generates step-by-step reasoning text before producing an answer. This text appears to offer a human-readable window into their computation process, and is increasingly r...

arxiv.org

August 27, 2025 at 9:30 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news