Faithful explainability, controllability & safety of LLMs.
🔎 On the academic job market 🔎
https://mttk.github.io/
Ever wonder whether verbalized CoTs correspond to the internal reasoning process of the model?
We propose a novel parametric faithfulness approach, which erases information contained in CoT steps from the model parameters to assess CoT faithfulness.
arxiv.org/abs/2502.14829
Teaching People LLM's Errors and Getting it Right
https://arxiv.org/abs/2512.21422
Teaching People LLM's Errors and Getting it Right
https://arxiv.org/abs/2512.21422
Join the MilaNLP team and contribute to our upcoming research projects.
🔗 More details: milanlproc.github.io/open_positio...
⏰ Deadline: Jan 31, 2026
Join the MilaNLP team and contribute to our upcoming research projects.
🔗 More details: milanlproc.github.io/open_positio...
⏰ Deadline: Jan 31, 2026
💡 Abstract deadline: Thursday, March 26, 2026
📄 Full paper submission deadline: Tuesday, March 31, 2026
Call for papers (website coming soon):
docs.google.com/document/d/1...
💡 Abstract deadline: Thursday, March 26, 2026
📄 Full paper submission deadline: Tuesday, March 31, 2026
Call for papers (website coming soon):
docs.google.com/document/d/1...
Here is a blog post summarizing the talk:
davidbau.com/archives/202...
Here is a blog post summarizing the talk:
davidbau.com/archives/202...
Topics include, but aren’t limited to:
🔎Linguistic Interpretability
🌍Multilingual Evaluation
📖Computational Typology
Please share!
#NLProc #NLP
Topics include, but aren’t limited to:
🔎Linguistic Interpretability
🌍Multilingual Evaluation
📖Computational Typology
Please share!
#NLProc #NLP
Private Governance & Oversight Mechanisms for AI). Very much looking forward to the discussions!
If you are at #EurIPS and want to chat about LLM's training data. Reach out!
Have you ever wondered what the political content in LLM's training data is? What are the political opinions expressed? What is the proportion of left- vs right-leaning documents in the pre- and post-training data? Do they correlate with the political biases reflected in models?
Private Governance & Oversight Mechanisms for AI). Very much looking forward to the discussions!
If you are at #EurIPS and want to chat about LLM's training data. Reach out!
I’m recruiting a postdoc for my lab at NYU! Topics include LM reasoning, creativity, limitations of scaling, AI for science, & more! Apply by Feb 1.
(Different from NYU Faculty Fellows, which are also great but less connected to my lab.)
Link in 🧵
I’m recruiting a postdoc for my lab at NYU! Topics include LM reasoning, creativity, limitations of scaling, AI for science, & more! Apply by Feb 1.
(Different from NYU Faculty Fellows, which are also great but less connected to my lab.)
Link in 🧵
Exhibit A: openreview.net/forum?id=8qk...
Exhibit B: openreview.net/forum?id=GlX...
Exhibit C: openreview.net/forum?id=kDh...
Exhibit A: openreview.net/forum?id=8qk...
Exhibit B: openreview.net/forum?id=GlX...
Exhibit C: openreview.net/forum?id=kDh...
ReSkies much appreciated
ReSkies much appreciated
KSoC: utah.peopleadmin.com/postings/190... (AI broadly)
Education + AI:
- utah.peopleadmin.com/postings/189...
- utah.peopleadmin.com/postings/190...
Computer Vision:
- utah.peopleadmin.com/postings/183...
KSoC: utah.peopleadmin.com/postings/190... (AI broadly)
Education + AI:
- utah.peopleadmin.com/postings/189...
- utah.peopleadmin.com/postings/190...
Computer Vision:
- utah.peopleadmin.com/postings/183...
"Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps"
by Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovic, and Yonatan Belinkov
aclanthology.org/2025.emnlp-m...
6/n
"Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps"
by Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovic, and Yonatan Belinkov
aclanthology.org/2025.emnlp-m...
6/n
Huge thanks to my amazing collaborators @fatemehc.bsky.social @anamarasovic.bsky.social @boknilev.bsky.social , this would not have been possible without them!
Huge thanks to my amazing collaborators @fatemehc.bsky.social @anamarasovic.bsky.social @boknilev.bsky.social , this would not have been possible without them!
Paper: aclanthology.org/2025.emnlp-m...
Slides/video/poster: underline.io/events/502/s...
Paper: aclanthology.org/2025.emnlp-m...
Slides/video/poster: underline.io/events/502/s...
I'll present our parametric CoT faithfulness work (arxiv.org/abs/2502.14829) on Wednesday at the second Interpretability session, 16:30-18:00 local time A104-105
If you're in Suzhou, reach out to talk all things reasoning :)
I'll present our parametric CoT faithfulness work (arxiv.org/abs/2502.14829) on Wednesday at the second Interpretability session, 16:30-18:00 local time A104-105
If you're in Suzhou, reach out to talk all things reasoning :)
- Explainable NLU, main supervisor: myself, start in Spring 2026 tinyurl.com/3uset3dm
- Trustworthy NLP, main supervisor: @apepa.bsky.social, start in Spring 2026 tinyurl.com/yxj8yk4m
- Open-topic: express interest via ELLIS, start in Autumn 2026 tinyurl.com/2hcxexyx
1️⃣ Purbid's 𝐩𝐨𝐬𝐭𝐞𝐫 at 𝐒𝐨𝐋𝐚𝐑 (𝟏𝟏:𝟏𝟓𝐚𝐦-𝟏:𝟎𝟎𝐩𝐦) on catching redundant preference pairs & how pruning them hurts accuracy; www.anamarasovic.com/publications...
2️⃣ My 𝐭𝐚𝐥𝐤 at 𝐗𝐋𝐋𝐌-𝐑𝐞𝐚𝐬𝐨𝐧-𝐏𝐥𝐚𝐧 (𝟏𝟐𝐩𝐦) on measuring CoT faithfulness by looking at internals, not just behaviorally
1/3
1️⃣ Purbid's 𝐩𝐨𝐬𝐭𝐞𝐫 at 𝐒𝐨𝐋𝐚𝐑 (𝟏𝟏:𝟏𝟓𝐚𝐦-𝟏:𝟎𝟎𝐩𝐦) on catching redundant preference pairs & how pruning them hurts accuracy; www.anamarasovic.com/publications...
2️⃣ My 𝐭𝐚𝐥𝐤 at 𝐗𝐋𝐋𝐌-𝐑𝐞𝐚𝐬𝐨𝐧-𝐏𝐥𝐚𝐧 (𝟏𝟐𝐩𝐦) on measuring CoT faithfulness by looking at internals, not just behaviorally
1/3
1️⃣ Purbid's 𝐩𝐨𝐬𝐭𝐞𝐫 at 𝐒𝐨𝐋𝐚𝐑 (𝟏𝟏:𝟏𝟓𝐚𝐦-𝟏:𝟎𝟎𝐩𝐦) on catching redundant preference pairs & how pruning them hurts accuracy; www.anamarasovic.com/publications...
2️⃣ My 𝐭𝐚𝐥𝐤 at 𝐗𝐋𝐋𝐌-𝐑𝐞𝐚𝐬𝐨𝐧-𝐏𝐥𝐚𝐧 (𝟏𝟐𝐩𝐦) on measuring CoT faithfulness by looking at internals, not just behaviorally
1/3
🚀 New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs🚀🧵
🚀 New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs🚀🧵
Check out the thread for the (many) other interesting works from his group 🎉
@mtutek.bsky.social , who couldn't travel:
bsky.app/profile/mtut...
Later that day we'll have a poster on predicting success of model editing by Yanay Soker, who also couldn't travel
Check out the thread for the (many) other interesting works from his group 🎉
Context Parametrization with Compositional Adapters
https://arxiv.org/abs/2509.22158
Context Parametrization with Compositional Adapters
https://arxiv.org/abs/2509.22158