Lightnews — Scholar-powered news

neelnanda.bsky.social

@neelnanda.bsky.social

We are also in need of a lot of reviewers! All volunteers appreciated ❤️ Express interest: forms.gle/4euQVPdFEty...

See more info on our call for papers: mechinterpworkshop.com/cfp

And learn more about the workshop: mechinterpworkshop.com

Mech Interp Workshop Reviewer Expression of Interest (NeurIPS 2025)

Thanks for being interested in helping review for our workshop! Reviewers help make this whole thing possible. Reviews must be performed between Aug 25 and Sept 8

docs.google.com

July 13, 2025 at 1:00 PM

neelnanda.bsky.social

@neelnanda.bsky.social

All approaches encouraged, if they can give a convincing case that they further the field. Open source work, new methods, negative results, combining white + black box methods, from probing to circuit analysis, from real-world tasks to rigorous case studies - only quality matters

July 13, 2025 at 1:00 PM

neelnanda.bsky.social

@neelnanda.bsky.social

I'm curious how much other people encounter this kind of thing

June 1, 2025 at 6:48 PM

neelnanda.bsky.social

@neelnanda.bsky.social

The mindset of Socratic Persuasion is shockingly versatile. I use it on a near daily basis in my personal and professional life: conflict resolution, helping prioritize, correcting misconceptions, gently giving negative feedback. My post has 8 case studies, to give you a sense:

May 26, 2025 at 6:37 PM

neelnanda.bsky.social

@neelnanda.bsky.social

Isn't this all obvious?

Maybe! Asking questions rather than lecturing people is hardly a novel insight

But people often assume the Qs must be neutral and open-ended. It can be very useful to be opinionated! But you need error correction mechanisms for when you're wrong.

May 26, 2025 at 6:37 PM

neelnanda.bsky.social

@neelnanda.bsky.social

Crucially, the goal is to give the right advice, not to *be* right. Asking Qs is far better received when you are genuinely listening to the answers, and open to changing your mind. No matter how much I know about a topic, they know more about their life, and I'm often wrong.

May 26, 2025 at 6:37 PM

neelnanda.bsky.social

@neelnanda.bsky.social

Socratic persuasion is more effective if I'm right: they feel more like they generated the argument themselves, and are less defensive.

Done right, I think its more collaborative - combining my perspective and their context to find the best advice. Better for both parties!

May 26, 2025 at 6:37 PM

neelnanda.bsky.social

@neelnanda.bsky.social

I can integrate the new info and pivot if needed, without embarrassment, and together we converge on the right advice. It's far more robust - since the other person is actively answering questions, disagreements surface fast

The post: www.neelnanda.io/blog/51-soc...

More thoughts in 🧵

May 26, 2025 at 6:37 PM

neelnanda.bsky.social

@neelnanda.bsky.social

Note: This is optimised for “jump through the arbitrary hoops of conference publishing, to maximise your chance of getting in with minimal sacrifice to your scientific integrity”. I hope to have another post out soon on how to write *good* papers.

May 11, 2025 at 10:47 PM

neelnanda.bsky.social

@neelnanda.bsky.social

The checklist is pretty long - this is deliberate, as there's lots of moving parts! I tried to cover literally everything I could think of that should be done when submitting. I tried to italicise everything that should be fast or skippable, and obviously use your own judgement.

May 11, 2025 at 10:47 PM

neelnanda.bsky.social

@neelnanda.bsky.social

See the checklist here:
docs.google.com/document/d/...

And check out my research sequence for more takes on the broader research process:
x.com/NeelNanda5/...

[Public] My Checklist for Writing ML Conference Papers

There are a lot of moving pieces when writing a machine learning conference paper. It's easy to forget things, especially when you're new, and lots of best practices/nuances no one writes up. Here's a checklist I wrote for my mentees of everything I think you need to do to go from finishing the t...

docs.google.com

May 11, 2025 at 10:47 PM

neelnanda.bsky.social

@neelnanda.bsky.social

Check it out here - post 4 on how to choose your research problems coming out soon!
www.alignmentforum.org/posts/Ldrss...

My Research Process: Understanding and Cultivating Research Taste — AI Alignment Forum

This is post 3 of a sequence on my framework for doing and thinking about research. Start here. Thanks to my co-author Gemini 2.5 Pro …

www.alignmentforum.org

May 2, 2025 at 1:00 PM

neelnanda.bsky.social

@neelnanda.bsky.social

I think of (intuitive) taste like a neural network - decisions are data points, outcomes are labels. You'll learn organically, but can speed up!

Supervision: Papers/Ask a mentor
Sample efficiency: Reflect on *why* you were wrong
Episode length: Taste for short tasks comes faster

May 2, 2025 at 1:00 PM

neelnanda.bsky.social

@neelnanda.bsky.social

Great work by @NunoSempere and co. Check out yesterday's here:
blog.sentinel-team.org/p/global-ri...

April 29, 2025 at 1:00 PM

neelnanda.bsky.social

@neelnanda.bsky.social

As a striking example of how effective this is, Matryoshka SAEs fairly reliably get better on most metrics as you make them wider, as neural networks should. Normal sparse autoencoders do not.

April 2, 2025 at 1:01 PM

neelnanda.bsky.social

@neelnanda.bsky.social

This slightly worsens reconstruction (it's basically regurisation), but improves performance substantially on some downstream tasks and measurements of sparsity issues!

April 2, 2025 at 1:01 PM

neelnanda.bsky.social

@neelnanda.bsky.social

With a small tweak to the loss, we can simultaneously train SAEs of several different sizes that all work together to reconstruct things. Small ones learn high-level features, while wide ones learn low-level features!

April 2, 2025 at 1:01 PM

neelnanda.bsky.social

@neelnanda.bsky.social

In this specific case, the work focused on how sparsity is an imperfect proxy to optimize if we actually want interpretability. In particular, wide SAEs break apart high-level concepts into narrower ones via absorption, composition, and splitting.

April 2, 2025 at 1:01 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news