Natalie Shapira
natalieshapira.bsky.social
Natalie Shapira
@natalieshapira.bsky.social
Tell me about challenges, the unbelievable, the human mind and artificial intelligence, thoughts, social life, family life, science and philosophy.
Pinned

My PhD dissertation -

Challenging Natural Language Processing through the Lens of Psychology

Could be found here:

www.researchgate.net/profile/Nata...
Reposted by Natalie Shapira
Humans and LLMs think fast and slow. Do SAEs recover slow concepts in LLMs? Not really.

Our Temporal Feature Analyzer discovers contextual features in LLMs, that detect event boundaries, parse complex grammar, and represent ICL patterns.
November 13, 2025 at 10:32 PM
That was my slide for today's plotathon
November 12, 2025 at 4:07 PM
A concept I really like in the Bau Lab
@davidbau.bsky.social
is: Plotathon 🔥

Every ~2 weeks, the entire lab drops whatever they're working on and shares SOMETHING

Tomorrow we meet with Aaron's group
@amuuueller.bsky.social .
Looking forward! 🩵
November 11, 2025 at 11:12 PM
Reposted by Natalie Shapira
🤔 But do these heads play a *causal* role in the operation?

To test them, we transport their query states from one context to another. We find that will trigger the execution of the same filtering operation, even if the new context has a new list of items and format!
November 4, 2025 at 5:48 PM
Reposted by Natalie Shapira
How can a language model find the veggies in a menu?

New pre-print where we investigate the internal mechanisms of LLMs when filtering on a list of options.

Spoiler: turns out LLMs use strategies surprisingly similar to functional programming (think "filter" from python)! 🧵
November 4, 2025 at 5:48 PM
Sometimes researchers draw my attention to a paper. When many of them do so, there's a double magic hidden - It's probably truly (1) interests me and (2) the community.

I first felt this harmony (me-community) when my proposal for IBM's next Grand Challenge was chosen ->
November 3, 2025 at 6:27 PM
Reposted by Natalie Shapira
Ever wished you could explore what's happening inside a 405B parameter model without writing any code? Workbench, our AI interpretability interface, is now live for public beta at workbench.ndif.us!
October 10, 2025 at 5:35 PM
Reposted by Natalie Shapira
Now officially out:

"Re-evaluating Theory of Mind evaluation in large language models"

royalsocietypublishing.org/doi/10.1098/...

(by Hu, Sosa, & me)
August 18, 2025 at 1:01 PM
I was an engineer-researcher. I heard the frustrations of my colleagues from the management (pay, conditions, responsibility etc).

I'm the daughter of managers. I've heard their side too, dealing with difficult employees (boundaries, lack of gratitude etc).

This is a Theory of Mind problem.
July 3, 2025 at 6:15 AM
Reposted by Natalie Shapira
🚨 Registration is live! 🚨

The New England Mechanistic Interpretability (NEMI) Workshop is happening Aug 22nd 2025 at Northeastern University!

A chance for the mech interp community to nerd out on how models really work 🧠🤖

🌐 Info: nemiconf.github.io/summer25/
📝 Register: forms.gle/v4kJCweE3UUH...
June 30, 2025 at 10:55 PM
Reposted by Natalie Shapira
How do language models track mental states of each character in a story, often referred to as Theory of Mind?

We reverse-engineered how LLaMA-3-70B-Instruct handles a belief-tracking task and found something surprising: it uses mechanisms strikingly similar to pointer variables in C programming!
June 24, 2025 at 5:13 PM
I am really proud to share our work led by Nikhil Prakash and in collaboration with more mechanistic interpretability and Theory of Mind (ToM) researchers:
arxiv.org/abs/2505.14685
You can find a tweet here with nice animations:
x.com/nikhil07prak...
Language Models use Lookbacks to Track Beliefs
How do language models (LMs) represent characters' beliefs, especially when those beliefs may differ from reality? This question lies at the heart of understanding the Theory of Mind (ToM) capabilitie...
arxiv.org
June 24, 2025 at 4:29 PM
I have a research proposal built mostly on my intuition.

I want rigorous feedback, brutal honesty, and every reason why it might fail. How can I do that without damaging my reputation in the research community?

I'd love to hear from anyone who's done this before. DM or reply?
June 24, 2025 at 8:04 AM
Reposted by Natalie Shapira
Can we uncover the list of topics a language model is censored on?

Refused topics vary strongly among models. Claude-3.5 vs DeepSeek-R1 refusal patterns:
June 13, 2025 at 3:59 PM
I heard there were fights about yes-animals and no-animals at shelters.

So here they came out with a clear message (yes-animals 😍)
June 16, 2025 at 10:13 AM
A wave of emails, is it a gentle way of checking if I'm still alive, or have I just let everything overflow...

(I'll get back to everyone, my mind is really distracted and it's hard to focus. I'll do it slowly but surely)
June 16, 2025 at 8:52 AM
Physical strength was an advantage in the market, then came machines.
Diamonds used to be a status symbol, since then their value has been cut by 40%.
Today intelligence is considered an asset.. but what happens when that too is reversed?

Worth thinking about what will come after the cheese move 🧀
June 12, 2025 at 8:05 AM
In "The Alchemist" it says you need to know how to appreciate what you find.

It may not be easy to recognize that we have found what we were looking for, that in front of us is exactly what we wished and hoped for.

I can't remember exactly the quote and the chats are hallucinating.

Recognize?
June 11, 2025 at 8:28 AM
I received a threat that they would kick me out of the group.

I don't want to stay in a group that is ashamed of its rules and restricts freedom of speech, yet I haven't left. If my values don't align with the group's values, then that's your problem, not mine, and I don't care you kicking me out.
June 9, 2025 at 5:06 AM
Today I learnt in a forum for women in technological research that they don't allow discussions from community members considering a move to non-technological roles (e.g., to product management in generative AI).

I have no words to explain why this bothers me so much.
June 8, 2025 at 8:24 PM
Even Isaac Asimov understood that the Three Laws of Robotics cannot truly be programmed.
June 7, 2025 at 6:48 PM
I'm trying to figure out whether I'm teaching AI the theory of mind or whether AI is teaching me.
June 4, 2025 at 8:04 AM
Researchers, we need your help. US budgets right now could be disastrous for the world of research.

David wrote a blog post with links to people who could make an impact, take a look to see if you have connections to them and explain the implications to them.
FRIENDS: American science is being decimated by Congress NOW.

Your help is needed to fix this. The current DC plan PERMANENTLY slashes NSF, NIH, all science training. Money isn't redirected—it's gone.

Please read+share what's happening

thevisible.net/posts/004-s...
June 4, 2025 at 5:01 AM
ask your model to surprise you and share the results
May 26, 2025 at 12:59 PM
When I wrote the acknowledgements section for my PhD, there was a teacher who had a profound influence on me that I did not mention. I forgot to say thank you because what he taught me went directly into my subconscious.

You can find it here:
hitechwoman.blogspot.com/2025/05/harm...
May 22, 2025 at 5:45 AM