Lightnews — Scholar-powered news

Raj Movva

@rajmovva.bsky.social

230 followers 130 following 47 posts

NLP, ML & society, healthcare.
PhD student at Berkeley, previously CS at MIT.
https://rajivmovva.com/

Posts Replies Media Videos

Raj Movva

@rajmovva.bsky.social

What a crossover!

August 19, 2025 at 12:40 AM

Raj Movva

@rajmovva.bsky.social

This is amazing

August 16, 2025 at 6:19 PM

Raj Movva

@rajmovva.bsky.social

They're in their move fast and break things era 🙃

August 6, 2025 at 3:57 AM

Raj Movva

@rajmovva.bsky.social

This take emerged organically from just how well our method on SAEs for hypothesis generation (HypotheSAEs) performed, which surprised all of us!

See the paper arxiv.org/abs/2506.23845

Thanks @kennypeng.bsky.social, Jon, @emmapierson.bsky.social, @nkgarg.bsky.social for another nice collaboration.

Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts

While sparse autoencoders (SAEs) have generated significant excitement, a series of negative results have added to skepticism about their usefulness. Here, we establish a conceptual distinction that r...

arxiv.org

August 5, 2025 at 4:33 PM

Raj Movva

@rajmovva.bsky.social

This capability of discovering unknown concepts opens many opportunities for applied machine learning. We can design better whitebox predictors, better audit high-stakes models for bias, and generate hypotheses for CSS research. More broadly, SAEs can help bridge the "prediction-explanation" gap.

August 5, 2025 at 4:33 PM

Raj Movva

@rajmovva.bsky.social

These tasks lie in contrast to probing, where we're trying to predict the presence of a *known* concept; and steering, where we're trying to include a *known* concept in an LLM output. SAEs lose to simple baselines on these tasks. (2 good papers on this: "AxBench" and Kantamneni, Engels et al. 2025)

August 5, 2025 at 4:33 PM

Raj Movva

@rajmovva.bsky.social

How do we reconcile our view with recent negative results? Our key distinction is that SAEs are useful when you don't know what you're looking for: how does my text classifier predict which headlines will go viral? How does my LLM perform addition? These are "unknown unknowns".

August 5, 2025 at 4:33 PM

Raj Movva

@rajmovva.bsky.social

Nice work! Cool to see that item difficulty predicts human-llm disagreement. We also studied similar questions with the DICES dataset: aclanthology.org/2024.emnlp-m...