Arnab Sen Sharma
arnabsensharma.bsky.social
Arnab Sen Sharma
@arnabsensharma.bsky.social
PhD Student at Northeastern, working to make LLMs interpretable
Pinned
How can a language model find the veggies in a menu?

New pre-print where we investigate the internal mechanisms of LLMs when filtering on a list of options.

Spoiler: turns out LLMs use strategies surprisingly similar to functional programming (think "filter" from python)! 🧵
Reposted by Arnab Sen Sharma
Humans and LLMs think fast and slow. Do SAEs recover slow concepts in LLMs? Not really.

Our Temporal Feature Analyzer discovers contextual features in LLMs, that detect event boundaries, parse complex grammar, and represent ICL patterns.
November 13, 2025 at 10:32 PM
How can a language model find the veggies in a menu?

New pre-print where we investigate the internal mechanisms of LLMs when filtering on a list of options.

Spoiler: turns out LLMs use strategies surprisingly similar to functional programming (think "filter" from python)! 🧵
November 4, 2025 at 5:48 PM
Reposted by Arnab Sen Sharma
How do language models track mental states of each character in a story, often referred to as Theory of Mind?

We reverse-engineered how LLaMA-3-70B-Instruct handles a belief-tracking task and found something surprising: it uses mechanisms strikingly similar to pointer variables in C programming!
June 24, 2025 at 5:13 PM
Reposted by Arnab Sen Sharma
More big news! Applications are open for the NDIF Summer Engineering Fellowship—an opportunity to work on cutting-edge AI research infrastructure this summer in Boston! 🚀
December 10, 2024 at 9:59 PM