BlackboxNLP
@blackboxnlp.bsky.social
The largest workshop on analysing and interpreting neural networks for NLP.
BlackboxNLP will be held at EMNLP 2025 in Suzhou, China
blackboxnlp.github.io
BlackboxNLP will be held at EMNLP 2025 in Suzhou, China
blackboxnlp.github.io
Our panel moderated by @danaarad.bsky.social
"Evaluating Interpretability Methods: Challenges and Future Directions" just started! 🎉 Come to learn more about the MIB benchmark and hear the takes of @michaelwhanna.bsky.social, Michal Golovanevsky, Nicolò Brunello and Mingyang Wang!
"Evaluating Interpretability Methods: Challenges and Future Directions" just started! 🎉 Come to learn more about the MIB benchmark and hear the takes of @michaelwhanna.bsky.social, Michal Golovanevsky, Nicolò Brunello and Mingyang Wang!
November 9, 2025 at 6:55 AM
Our panel moderated by @danaarad.bsky.social
"Evaluating Interpretability Methods: Challenges and Future Directions" just started! 🎉 Come to learn more about the MIB benchmark and hear the takes of @michaelwhanna.bsky.social, Michal Golovanevsky, Nicolò Brunello and Mingyang Wang!
"Evaluating Interpretability Methods: Challenges and Future Directions" just started! 🎉 Come to learn more about the MIB benchmark and hear the takes of @michaelwhanna.bsky.social, Michal Golovanevsky, Nicolò Brunello and Mingyang Wang!
Next up: Kentaro Ozeki presenting "Normative Reasoning in Large Language Models: A Comparative Benchmark from Logical and Modal Perspectives" aclanthology.org/2025.blackbo...
November 9, 2025 at 6:32 AM
Next up: Kentaro Ozeki presenting "Normative Reasoning in Large Language Models: A Comparative Benchmark from Logical and Modal Perspectives" aclanthology.org/2025.blackbo...
After a productive poster session, BlackboxNLP returns with the second keynote "Memorization: Myth or Mystery?" by @vernadankers.bsky.social!
November 9, 2025 at 5:48 AM
After a productive poster session, BlackboxNLP returns with the second keynote "Memorization: Myth or Mystery?" by @vernadankers.bsky.social!
Nadav Shani is giving the first oral presentation of the day: Language Dominance in Multilingual Large Language Models. Find the paper here: aclanthology.org/2025.blackbo...
November 9, 2025 at 2:19 AM
Nadav Shani is giving the first oral presentation of the day: Language Dominance in Multilingual Large Language Models. Find the paper here: aclanthology.org/2025.blackbo...
Next up: Circuit-Tracer: A New Library for Finding Feature Circuits presented by @michaelwhanna.bsky.social! Paper: aclanthology.org/2025.blackbo...
November 9, 2025 at 2:18 AM
Next up: Circuit-Tracer: A New Library for Finding Feature Circuits presented by @michaelwhanna.bsky.social! Paper: aclanthology.org/2025.blackbo...
Quanshi Zhang is giving the first keynote of the day: Can Neural Network Interpretability Be the Key to Breaking Through Scaling Law Limitations in Deep Learning?
November 9, 2025 at 1:38 AM
Quanshi Zhang is giving the first keynote of the day: Can Neural Network Interpretability Be the Key to Breaking Through Scaling Law Limitations in Deep Learning?
BlackboxNLP is up and running! Here's the topics covered by this year's edition at a glance. Excited to see so many interesting topics, and the growing interest in reasoning!
November 9, 2025 at 1:38 AM
BlackboxNLP is up and running! Here's the topics covered by this year's edition at a glance. Excited to see so many interesting topics, and the growing interest in reasoning!
📢 Call for Papers! 📢
#BlackboxNLP 2025 invites the submission of archival and non-archival papers on interpreting and explaining NLP models.
📅 Deadlines: Aug 15 (direct submissions), Sept 5 (ARR commitment)
🔗 More details: blackboxnlp.github.io/2025/call/
#BlackboxNLP 2025 invites the submission of archival and non-archival papers on interpreting and explaining NLP models.
📅 Deadlines: Aug 15 (direct submissions), Sept 5 (ARR commitment)
🔗 More details: blackboxnlp.github.io/2025/call/
August 12, 2025 at 7:10 PM
📢 Call for Papers! 📢
#BlackboxNLP 2025 invites the submission of archival and non-archival papers on interpreting and explaining NLP models.
📅 Deadlines: Aug 15 (direct submissions), Sept 5 (ARR commitment)
🔗 More details: blackboxnlp.github.io/2025/call/
#BlackboxNLP 2025 invites the submission of archival and non-archival papers on interpreting and explaining NLP models.
📅 Deadlines: Aug 15 (direct submissions), Sept 5 (ARR commitment)
🔗 More details: blackboxnlp.github.io/2025/call/
Just 5 days left to submit your method to the MIB Shared Task at #BlackboxNLP!
Have last-minute questions or need help finalizing your submission?
Join the Discord server: discord.gg/n5uwjQcxPR
Have last-minute questions or need help finalizing your submission?
Join the Discord server: discord.gg/n5uwjQcxPR
August 3, 2025 at 6:40 AM
Just 5 days left to submit your method to the MIB Shared Task at #BlackboxNLP!
Have last-minute questions or need help finalizing your submission?
Join the Discord server: discord.gg/n5uwjQcxPR
Have last-minute questions or need help finalizing your submission?
Join the Discord server: discord.gg/n5uwjQcxPR
With the new extended deadline, there's still plenty of time to submit your method to the MIB Shared Task!
We welcome submissions of existing methods, experimental POCs, or any approach addressing circuit discovery or causal variable localization 💡
We welcome submissions of existing methods, experimental POCs, or any approach addressing circuit discovery or causal variable localization 💡
July 30, 2025 at 5:57 AM
With the new extended deadline, there's still plenty of time to submit your method to the MIB Shared Task!
We welcome submissions of existing methods, experimental POCs, or any approach addressing circuit discovery or causal variable localization 💡
We welcome submissions of existing methods, experimental POCs, or any approach addressing circuit discovery or causal variable localization 💡
Results deadline extended by one week!
Following requests from participants, we’re extending the MIB Shared Task submission deadline by one week.
🗓️ New deadline: August 8, 2025
Submit your method via the MIB leaderboard!
Following requests from participants, we’re extending the MIB Shared Task submission deadline by one week.
🗓️ New deadline: August 8, 2025
Submit your method via the MIB leaderboard!
July 29, 2025 at 9:35 AM
Results deadline extended by one week!
Following requests from participants, we’re extending the MIB Shared Task submission deadline by one week.
🗓️ New deadline: August 8, 2025
Submit your method via the MIB leaderboard!
Following requests from participants, we’re extending the MIB Shared Task submission deadline by one week.
🗓️ New deadline: August 8, 2025
Submit your method via the MIB leaderboard!
📝 Technical report guidelines are out!
If you're submitting to the MIB Shared Task at #BlackboxNLP, feel free to take a look to help you prepare your report: blackboxnlp.github.io/2025/task/
If you're submitting to the MIB Shared Task at #BlackboxNLP, feel free to take a look to help you prepare your report: blackboxnlp.github.io/2025/task/
July 28, 2025 at 12:34 PM
📝 Technical report guidelines are out!
If you're submitting to the MIB Shared Task at #BlackboxNLP, feel free to take a look to help you prepare your report: blackboxnlp.github.io/2025/task/
If you're submitting to the MIB Shared Task at #BlackboxNLP, feel free to take a look to help you prepare your report: blackboxnlp.github.io/2025/task/
Just 10 days to go until the results submission deadline for the MIB Shared Task at #BlackboxNLP!
If you're working on:
🧠 Circuit discovery
🔍 Feature attribution
🧪 Causal variable localization
now’s the time to polish and submit!
Join us on Discord: discord.gg/n5uwjQcxPR
If you're working on:
🧠 Circuit discovery
🔍 Feature attribution
🧪 Causal variable localization
now’s the time to polish and submit!
Join us on Discord: discord.gg/n5uwjQcxPR
July 23, 2025 at 7:42 AM
Just 10 days to go until the results submission deadline for the MIB Shared Task at #BlackboxNLP!
If you're working on:
🧠 Circuit discovery
🔍 Feature attribution
🧪 Causal variable localization
now’s the time to polish and submit!
Join us on Discord: discord.gg/n5uwjQcxPR
If you're working on:
🧠 Circuit discovery
🔍 Feature attribution
🧪 Causal variable localization
now’s the time to polish and submit!
Join us on Discord: discord.gg/n5uwjQcxPR
⏳ Three weeks left! Submit your work to the MIB Shared Task at #BlackboxNLP, co-located with @emnlpmeeting.bsky.social
Whether you're working on circuit discovery or causal variable localization, this is your chance to benchmark your method in a rigorous setup!
Whether you're working on circuit discovery or causal variable localization, this is your chance to benchmark your method in a rigorous setup!
July 13, 2025 at 5:56 AM
⏳ Three weeks left! Submit your work to the MIB Shared Task at #BlackboxNLP, co-located with @emnlpmeeting.bsky.social
Whether you're working on circuit discovery or causal variable localization, this is your chance to benchmark your method in a rigorous setup!
Whether you're working on circuit discovery or causal variable localization, this is your chance to benchmark your method in a rigorous setup!
Working on feature attribution, circuit discovery, feature alignment, or sparse coding?
Consider submitting your work to the MIB Shared Task, part of this year’s #BlackboxNLP
We welcome submissions of both existing methods and new or experimental POCs!
Consider submitting your work to the MIB Shared Task, part of this year’s #BlackboxNLP
We welcome submissions of both existing methods and new or experimental POCs!
July 8, 2025 at 9:35 AM
Working on feature attribution, circuit discovery, feature alignment, or sparse coding?
Consider submitting your work to the MIB Shared Task, part of this year’s #BlackboxNLP
We welcome submissions of both existing methods and new or experimental POCs!
Consider submitting your work to the MIB Shared Task, part of this year’s #BlackboxNLP
We welcome submissions of both existing methods and new or experimental POCs!
New to mechanistic interpretability?
The MIB shared task is a great opportunity to experiment:
✅ Clean setup
✅ Open baseline code
✅ Standard evaluation
Join the discord server for ideas and discussions: discord.gg/n5uwjQcxPR
The MIB shared task is a great opportunity to experiment:
✅ Clean setup
✅ Open baseline code
✅ Standard evaluation
Join the discord server for ideas and discussions: discord.gg/n5uwjQcxPR
July 7, 2025 at 8:42 AM
New to mechanistic interpretability?
The MIB shared task is a great opportunity to experiment:
✅ Clean setup
✅ Open baseline code
✅ Standard evaluation
Join the discord server for ideas and discussions: discord.gg/n5uwjQcxPR
The MIB shared task is a great opportunity to experiment:
✅ Clean setup
✅ Open baseline code
✅ Standard evaluation
Join the discord server for ideas and discussions: discord.gg/n5uwjQcxPR
🚨 Excited to announce two invited speakers at #BlackboxNLP 2025!
Join us to hear from two leading voices in interpretability:
🎙️ Quanshi Zhang (Shanghai Jiao Tong University)
🎙️ Verna Dankers (McGill University)
@vernadankers.bsky.social
Join us to hear from two leading voices in interpretability:
🎙️ Quanshi Zhang (Shanghai Jiao Tong University)
🎙️ Verna Dankers (McGill University)
@vernadankers.bsky.social
July 4, 2025 at 8:14 AM
🚨 Excited to announce two invited speakers at #BlackboxNLP 2025!
Join us to hear from two leading voices in interpretability:
🎙️ Quanshi Zhang (Shanghai Jiao Tong University)
🎙️ Verna Dankers (McGill University)
@vernadankers.bsky.social
Join us to hear from two leading voices in interpretability:
🎙️ Quanshi Zhang (Shanghai Jiao Tong University)
🎙️ Verna Dankers (McGill University)
@vernadankers.bsky.social
A typical pipeline:
• Build contrastive input pairs differing only in the target variable.
• (If supervised) train the featurizer on these pairs.
• To evaluate: Transform activation, intervene in feature space, transform back out, and check if behavior shifts as expected.
• Build contrastive input pairs differing only in the target variable.
• (If supervised) train the featurizer on these pairs.
• To evaluate: Transform activation, intervene in feature space, transform back out, and check if behavior shifts as expected.
July 1, 2025 at 4:49 PM
A typical pipeline:
• Build contrastive input pairs differing only in the target variable.
• (If supervised) train the featurizer on these pairs.
• To evaluate: Transform activation, intervene in feature space, transform back out, and check if behavior shifts as expected.
• Build contrastive input pairs differing only in the target variable.
• (If supervised) train the featurizer on these pairs.
• To evaluate: Transform activation, intervene in feature space, transform back out, and check if behavior shifts as expected.
One month to go! ⏰
Working on featurization methods - ways to transform LM activations to better isolate causal variables?
Submit your work to the Causal Variable Localization Track of the MIB Shared Task!
Working on featurization methods - ways to transform LM activations to better isolate causal variables?
Submit your work to the Causal Variable Localization Track of the MIB Shared Task!
July 1, 2025 at 4:49 PM
One month to go! ⏰
Working on featurization methods - ways to transform LM activations to better isolate causal variables?
Submit your work to the Causal Variable Localization Track of the MIB Shared Task!
Working on featurization methods - ways to transform LM activations to better isolate causal variables?
Submit your work to the Causal Variable Localization Track of the MIB Shared Task!
Working on the MIB shared task?
Join the discord server: discord.gg/n5uwjQcxPR
🔍 Check out submission ideas
🔍 Brainstorm possible directions
🔍 Ask questions and get help with setup issues
Full task description: blackboxnlp.github.io/2025/task/
Join the discord server: discord.gg/n5uwjQcxPR
🔍 Check out submission ideas
🔍 Brainstorm possible directions
🔍 Ask questions and get help with setup issues
Full task description: blackboxnlp.github.io/2025/task/
June 30, 2025 at 8:32 AM
Working on the MIB shared task?
Join the discord server: discord.gg/n5uwjQcxPR
🔍 Check out submission ideas
🔍 Brainstorm possible directions
🔍 Ask questions and get help with setup issues
Full task description: blackboxnlp.github.io/2025/task/
Join the discord server: discord.gg/n5uwjQcxPR
🔍 Check out submission ideas
🔍 Brainstorm possible directions
🔍 Ask questions and get help with setup issues
Full task description: blackboxnlp.github.io/2025/task/
The Circuit Localization Track benchmarks methods for discovering causal circuits, subgraphs of a model responsible for specific behavior.
These methods typically:
• Score model components or edges
• Ablate all but the top-ranked ones
• Evaluate the performance of the resulting subgraph
These methods typically:
• Score model components or edges
• Ablate all but the top-ranked ones
• Evaluate the performance of the resulting subgraph
June 24, 2025 at 2:24 PM
The Circuit Localization Track benchmarks methods for discovering causal circuits, subgraphs of a model responsible for specific behavior.
These methods typically:
• Score model components or edges
• Ablate all but the top-ranked ones
• Evaluate the performance of the resulting subgraph
These methods typically:
• Score model components or edges
• Ablate all but the top-ranked ones
• Evaluate the performance of the resulting subgraph
Working on circuit discovery in LMs?
Consider submitting your work to the MIB Shared Task, part of #BlackboxNLP at @emnlpmeeting.bsky.social 2025!
The goal: benchmark existing MI methods and identify promising directions to precisely and concisely recover causal pathways in LMs >>
Consider submitting your work to the MIB Shared Task, part of #BlackboxNLP at @emnlpmeeting.bsky.social 2025!
The goal: benchmark existing MI methods and identify promising directions to precisely and concisely recover causal pathways in LMs >>
June 24, 2025 at 2:24 PM
Working on circuit discovery in LMs?
Consider submitting your work to the MIB Shared Task, part of #BlackboxNLP at @emnlpmeeting.bsky.social 2025!
The goal: benchmark existing MI methods and identify promising directions to precisely and concisely recover causal pathways in LMs >>
Consider submitting your work to the MIB Shared Task, part of #BlackboxNLP at @emnlpmeeting.bsky.social 2025!
The goal: benchmark existing MI methods and identify promising directions to precisely and concisely recover causal pathways in LMs >>
The task builds on the new Mechanistic Interpretability Benchmark (MIB) by Mueller* & Geiger* et al. (2025), with two tracks:
* Circuit Localization – identify subgraphs that carry out specific computations
* Causal Variable Localization – align internal representations with known causal factors
* Circuit Localization – identify subgraphs that carry out specific computations
* Causal Variable Localization – align internal representations with known causal factors
June 23, 2025 at 2:46 PM
The task builds on the new Mechanistic Interpretability Benchmark (MIB) by Mueller* & Geiger* et al. (2025), with two tracks:
* Circuit Localization – identify subgraphs that carry out specific computations
* Causal Variable Localization – align internal representations with known causal factors
* Circuit Localization – identify subgraphs that carry out specific computations
* Causal Variable Localization – align internal representations with known causal factors
Have you heard about this year's shared task? 📢
Mechanistic Interpretability (MI) is quickly advancing, but comparing methods remains a challenge. This year at #BlackboxNLP, we're introducing a shared task to rigorously evaluate MI methods in language models 🧵
Mechanistic Interpretability (MI) is quickly advancing, but comparing methods remains a challenge. This year at #BlackboxNLP, we're introducing a shared task to rigorously evaluate MI methods in language models 🧵
June 23, 2025 at 2:46 PM
Have you heard about this year's shared task? 📢
Mechanistic Interpretability (MI) is quickly advancing, but comparing methods remains a challenge. This year at #BlackboxNLP, we're introducing a shared task to rigorously evaluate MI methods in language models 🧵
Mechanistic Interpretability (MI) is quickly advancing, but comparing methods remains a challenge. This year at #BlackboxNLP, we're introducing a shared task to rigorously evaluate MI methods in language models 🧵
BlackboxNLP, the leading workshop on interpretability and analysis of language models, will be co-located with EMNLP 2025 in Suzhou this November! 📆
This edition will feature a new shared task on circuits/causal variable localization in LMs, details here: blackboxnlp.github.io/2025/task
This edition will feature a new shared task on circuits/causal variable localization in LMs, details here: blackboxnlp.github.io/2025/task
May 15, 2025 at 8:21 AM
BlackboxNLP, the leading workshop on interpretability and analysis of language models, will be co-located with EMNLP 2025 in Suzhou this November! 📆
This edition will feature a new shared task on circuits/causal variable localization in LMs, details here: blackboxnlp.github.io/2025/task
This edition will feature a new shared task on circuits/causal variable localization in LMs, details here: blackboxnlp.github.io/2025/task