Lightnews — Scholar-powered news

BlackboxNLP

@blackboxnlp.bsky.social

190 followers 300 following 44 posts

The largest workshop on analysing and interpreting neural networks for NLP.

BlackboxNLP will be held at EMNLP 2025 in Suzhou, China

blackboxnlp.github.io

Posts Replies Media Videos

BlackboxNLP

@blackboxnlp.bsky.social

Nicolò & Mingyang: Can we understand which circuits emerge in small models and reasoning-tuned systems, and how do they compare with default systems? Are there methods that generalize better across all tasks?

November 9, 2025 at 7:23 AM

BlackboxNLP

@blackboxnlp.bsky.social

Q: What's next for interpretability benchmarks? Michal: People sitting together and planning how to extend tests to multimodal, diverse contexts. @michaelwhanna.bsky.social: For circuit finding, integrating sparse features circuits could help us better understand our models.

November 9, 2025 at 7:21 AM

BlackboxNLP

@blackboxnlp.bsky.social

Nicolò & Mingyang: Starting to explore notebooks and public libraries can be very helpful in gaining early intuitions about what's promising.

November 9, 2025 at 7:16 AM

BlackboxNLP

@blackboxnlp.bsky.social

@michaelwhanna.bsky.social: Don't try to read everything. Find Qs you really care about, and go a level deeper to answer meaningful questions.

November 9, 2025 at 7:15 AM

BlackboxNLP

@blackboxnlp.bsky.social

Q: How would one go about approaching interpretability research these days? Michal: "When things don't work out of the box, it's a sign to double down and find out why. Negative results are important!"

November 9, 2025 at 7:15 AM

BlackboxNLP

@blackboxnlp.bsky.social

@danaarad.bsky.social: As deep learning research converges on similar architectures for different modalities, it will be interesting to determine which interpretability method will remain useful across various models and tasks.

November 9, 2025 at 7:15 AM

BlackboxNLP

@blackboxnlp.bsky.social

@michaelwhanna.bsky.social, Nicolò & Mingyang: Counterfactuals in minimal settings can be helpful, but they do not capture the whole story. Extending current methods to long contexts, and finding practical applications in safety-related areas are exciting challenges ahead.

November 9, 2025 at 7:07 AM

BlackboxNLP

@blackboxnlp.bsky.social

Michal: Mechanistic interpretability has heavily focused on toy tasks and text-only models. The next step is scaling to more complex tasks that involve real-world reasoning.

November 9, 2025 at 7:07 AM