neelnanda.bsky.social
@neelnanda.bsky.social
We are also in need of a lot of reviewers! All volunteers appreciated ❤️ Express interest: forms.gle/4euQVPdFEty...

See more info on our call for papers: mechinterpworkshop.com/cfp

And learn more about the workshop: mechinterpworkshop.com
Mech Interp Workshop Reviewer Expression of Interest (NeurIPS 2025)
Thanks for being interested in helping review for our workshop! Reviewers help make this whole thing possible. Reviews must be performed between Aug 25 and Sept 8
docs.google.com
July 13, 2025 at 1:00 PM
All approaches encouraged, if they can give a convincing case that they further the field. Open source work, new methods, negative results, combining white + black box methods, from probing to circuit analysis, from real-world tasks to rigorous case studies - only quality matters
July 13, 2025 at 1:00 PM
I'm curious how much other people encounter this kind of thing
June 1, 2025 at 6:48 PM
The mindset of Socratic Persuasion is shockingly versatile. I use it on a near daily basis in my personal and professional life: conflict resolution, helping prioritize, correcting misconceptions, gently giving negative feedback. My post has 8 case studies, to give you a sense:
May 26, 2025 at 6:37 PM
Isn't this all obvious?

Maybe! Asking questions rather than lecturing people is hardly a novel insight

But people often assume the Qs must be neutral and open-ended. It can be very useful to be opinionated! But you need error correction mechanisms for when you're wrong.
May 26, 2025 at 6:37 PM
Crucially, the goal is to give the right advice, not to *be* right. Asking Qs is far better received when you are genuinely listening to the answers, and open to changing your mind. No matter how much I know about a topic, they know more about their life, and I'm often wrong.
May 26, 2025 at 6:37 PM
Socratic persuasion is more effective if I'm right: they feel more like they generated the argument themselves, and are less defensive.

Done right, I think its more collaborative - combining my perspective and their context to find the best advice. Better for both parties!
May 26, 2025 at 6:37 PM
I can integrate the new info and pivot if needed, without embarrassment, and together we converge on the right advice. It's far more robust - since the other person is actively answering questions, disagreements surface fast

The post: www.neelnanda.io/blog/51-soc...

More thoughts in 🧵
May 26, 2025 at 6:37 PM
Note: This is optimised for “jump through the arbitrary hoops of conference publishing, to maximise your chance of getting in with minimal sacrifice to your scientific integrity”. I hope to have another post out soon on how to write *good* papers.
May 11, 2025 at 10:47 PM
The checklist is pretty long - this is deliberate, as there's lots of moving parts! I tried to cover literally everything I could think of that should be done when submitting. I tried to italicise everything that should be fast or skippable, and obviously use your own judgement.
May 11, 2025 at 10:47 PM
Check it out here - post 4 on how to choose your research problems coming out soon!
www.alignmentforum.org/posts/Ldrss...
My Research Process: Understanding and Cultivating Research Taste — AI Alignment Forum
This is post 3 of a sequence on my framework for doing and thinking about research. Start here. Thanks to my co-author Gemini 2.5 Pro …
www.alignmentforum.org
May 2, 2025 at 1:00 PM
I think of (intuitive) taste like a neural network - decisions are data points, outcomes are labels. You'll learn organically, but can speed up!

Supervision: Papers/Ask a mentor
Sample efficiency: Reflect on *why* you were wrong
Episode length: Taste for short tasks comes faster
May 2, 2025 at 1:00 PM
Great work by @NunoSempere and co. Check out yesterday's here:
blog.sentinel-team.org/p/global-ri...
April 29, 2025 at 1:00 PM
As a striking example of how effective this is, Matryoshka SAEs fairly reliably get better on most metrics as you make them wider, as neural networks should. Normal sparse autoencoders do not.
April 2, 2025 at 1:01 PM
This slightly worsens reconstruction (it's basically regurisation), but improves performance substantially on some downstream tasks and measurements of sparsity issues!
April 2, 2025 at 1:01 PM
With a small tweak to the loss, we can simultaneously train SAEs of several different sizes that all work together to reconstruct things. Small ones learn high-level features, while wide ones learn low-level features!
April 2, 2025 at 1:01 PM
In this specific case, the work focused on how sparsity is an imperfect proxy to optimize if we actually want interpretability. In particular, wide SAEs break apart high-level concepts into narrower ones via absorption, composition, and splitting.
April 2, 2025 at 1:01 PM