Lightnews — Scholar-powered news

Can

@canrager.bsky.social

Humans and LLMs think fast and slow. Do SAEs recover slow concepts in LLMs? Not really.

Our Temporal Feature Analyzer discovers contextual features in LLMs, that detect event boundaries, parse complex grammar, and represent ICL patterns.

November 13, 2025 at 10:32 PM

Reposted by Can

Aaron Mueller

@amuuueller.bsky.social

What's the right unit of analysis for understanding LLM internals? We explore in our mech interp survey (a major update from our 2024 ms).

We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!

October 1, 2025 at 2:03 PM

Reposted by Can

Koyena Pal

@koyena.bsky.social

🚨 Registration is live! 🚨

The New England Mechanistic Interpretability (NEMI) Workshop is happening Aug 22nd 2025 at Northeastern University!

A chance for the mech interp community to nerd out on how models really work 🧠🤖

🌐 Info: nemiconf.github.io/summer25/
📝 Register: forms.gle/v4kJCweE3UUH...

June 30, 2025 at 10:55 PM

Can

@canrager.bsky.social

Can we uncover the list of topics a language model is censored on?

Refused topics vary strongly among models. Claude-3.5 vs DeepSeek-R1 refusal patterns:

June 13, 2025 at 3:59 PM

Can

@canrager.bsky.social

Announcing ARBOR, an open research community for collectively understanding how reasoning models like openai-o3 and deepseek-r1 work. We invite all researchers and enthusiasts to this initiative by @wattenberg.bsky.social's and @davidbau.bsky.social's lab.

arborproject.github.io

February 20, 2025 at 7:55 PM

Can

@canrager.bsky.social

Addressing key concerns about AI competition.

darioamodei.com/on-deepseek-...

Dario Amodei — On DeepSeek and Export Controls

On DeepSeek and Export Controls

darioamodei.com

January 29, 2025 at 6:54 PM

Can

@canrager.bsky.social

The #38c3 Chaos Computer Conference was a blast! 🚀 Find the accompanying code for my intro workshop on activation steering in the thread.

January 10, 2025 at 5:10 PM

Can

@canrager.bsky.social

Sparse Autoencoders (SAEs) are popular, with 10+ new approaches proposed in the last year. How do we know if we are making progress? The field has relied on imperfect proxy metrics.

We are releasing SAE Bench, a suite of 8 SAE evaluations!

Project co-led with Adam Karvonen.

December 11, 2024 at 6:07 AM

Reposted by Can

NDIF Team

@ndif-team.bsky.social

More big news! Applications are open for the NDIF Summer Engineering Fellowship—an opportunity to work on cutting-edge AI research infrastructure this summer in Boston! 🚀

December 10, 2024 at 9:59 PM

Can

@canrager.bsky.social

Safe travels to #NeurIPS2025 in Vancouver BC! Join our poster sessions on *Measuring Progress in Dictionary Learning with Board Game Models* and *Evaluating Sparse Autoencoders on Concept Erasure Tasks*. Reach out brainstorm future interpretability benchmarks.

December 9, 2024 at 10:15 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news