Raj Movva
rajmovva.bsky.social
Raj Movva
@rajmovva.bsky.social
NLP, ML & society, healthcare.
PhD student at Berkeley, previously CS at MIT.
https://rajivmovva.com/
What a crossover!
August 19, 2025 at 12:40 AM
This is amazing
August 16, 2025 at 6:19 PM
They're in their move fast and break things era 🙃
August 6, 2025 at 3:57 AM
This take emerged organically from just how well our method on SAEs for hypothesis generation (HypotheSAEs) performed, which surprised all of us!

See the paper arxiv.org/abs/2506.23845

Thanks @kennypeng.bsky.social, Jon, @emmapierson.bsky.social, @nkgarg.bsky.social for another nice collaboration.
Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts
While sparse autoencoders (SAEs) have generated significant excitement, a series of negative results have added to skepticism about their usefulness. Here, we establish a conceptual distinction that r...
arxiv.org
August 5, 2025 at 4:33 PM
This capability of discovering unknown concepts opens many opportunities for applied machine learning. We can design better whitebox predictors, better audit high-stakes models for bias, and generate hypotheses for CSS research. More broadly, SAEs can help bridge the "prediction-explanation" gap.
August 5, 2025 at 4:33 PM
These tasks lie in contrast to probing, where we're trying to predict the presence of a *known* concept; and steering, where we're trying to include a *known* concept in an LLM output. SAEs lose to simple baselines on these tasks. (2 good papers on this: "AxBench" and Kantamneni, Engels et al. 2025)
August 5, 2025 at 4:33 PM
How do we reconcile our view with recent negative results? Our key distinction is that SAEs are useful when you don't know what you're looking for: how does my text classifier predict which headlines will go viral? How does my LLM perform addition? These are "unknown unknowns".
August 5, 2025 at 4:33 PM
Nice work! Cool to see that item difficulty predicts human-llm disagreement. We also studied similar questions with the DICES dataset: aclanthology.org/2024.emnlp-m...
Annotation alignment: Comparing LLM and human annotations of conversational safety
Rajiv Movva, Pang Wei Koh, Emma Pierson. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024.
aclanthology.org
July 12, 2025 at 3:26 PM
Huge congrats, Marianne!!
June 5, 2025 at 5:32 PM
I find that I've actually gone out of my way to stop using bullet points in reviews now because Any Review With Bullet Points is a Bot 🥲
May 27, 2025 at 10:03 PM
So awesome, congrats Lucy!!! 🧀
May 5, 2025 at 9:24 PM