Faithful explainability, controllability & safety of LLMs.
🔎 On the academic job market 🔎
https://mttk.github.io/
Ever wonder whether verbalized CoTs correspond to the internal reasoning process of the model?
We propose a novel parametric faithfulness approach, which erases information contained in CoT steps from the model parameters to assess CoT faithfulness.
arxiv.org/abs/2502.14829
ReSkies much appreciated
ReSkies much appreciated
KSoC: utah.peopleadmin.com/postings/190... (AI broadly)
Education + AI:
- utah.peopleadmin.com/postings/189...
- utah.peopleadmin.com/postings/190...
Computer Vision:
- utah.peopleadmin.com/postings/183...
KSoC: utah.peopleadmin.com/postings/190... (AI broadly)
Education + AI:
- utah.peopleadmin.com/postings/189...
- utah.peopleadmin.com/postings/190...
Computer Vision:
- utah.peopleadmin.com/postings/183...
"Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps"
by Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovic, and Yonatan Belinkov
aclanthology.org/2025.emnlp-m...
6/n
"Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps"
by Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovic, and Yonatan Belinkov
aclanthology.org/2025.emnlp-m...
6/n
Huge thanks to my amazing collaborators @fatemehc.bsky.social @anamarasovic.bsky.social @boknilev.bsky.social , this would not have been possible without them!
Huge thanks to my amazing collaborators @fatemehc.bsky.social @anamarasovic.bsky.social @boknilev.bsky.social , this would not have been possible without them!
Paper: aclanthology.org/2025.emnlp-m...
Slides/video/poster: underline.io/events/502/s...
Paper: aclanthology.org/2025.emnlp-m...
Slides/video/poster: underline.io/events/502/s...
I'll present our parametric CoT faithfulness work (arxiv.org/abs/2502.14829) on Wednesday at the second Interpretability session, 16:30-18:00 local time A104-105
If you're in Suzhou, reach out to talk all things reasoning :)
I'll present our parametric CoT faithfulness work (arxiv.org/abs/2502.14829) on Wednesday at the second Interpretability session, 16:30-18:00 local time A104-105
If you're in Suzhou, reach out to talk all things reasoning :)
- Explainable NLU, main supervisor: myself, start in Spring 2026 tinyurl.com/3uset3dm
- Trustworthy NLP, main supervisor: @apepa.bsky.social, start in Spring 2026 tinyurl.com/yxj8yk4m
- Open-topic: express interest via ELLIS, start in Autumn 2026 tinyurl.com/2hcxexyx
1️⃣ Purbid's 𝐩𝐨𝐬𝐭𝐞𝐫 at 𝐒𝐨𝐋𝐚𝐑 (𝟏𝟏:𝟏𝟓𝐚𝐦-𝟏:𝟎𝟎𝐩𝐦) on catching redundant preference pairs & how pruning them hurts accuracy; www.anamarasovic.com/publications...
2️⃣ My 𝐭𝐚𝐥𝐤 at 𝐗𝐋𝐋𝐌-𝐑𝐞𝐚𝐬𝐨𝐧-𝐏𝐥𝐚𝐧 (𝟏𝟐𝐩𝐦) on measuring CoT faithfulness by looking at internals, not just behaviorally
1/3
1️⃣ Purbid's 𝐩𝐨𝐬𝐭𝐞𝐫 at 𝐒𝐨𝐋𝐚𝐑 (𝟏𝟏:𝟏𝟓𝐚𝐦-𝟏:𝟎𝟎𝐩𝐦) on catching redundant preference pairs & how pruning them hurts accuracy; www.anamarasovic.com/publications...
2️⃣ My 𝐭𝐚𝐥𝐤 at 𝐗𝐋𝐋𝐌-𝐑𝐞𝐚𝐬𝐨𝐧-𝐏𝐥𝐚𝐧 (𝟏𝟐𝐩𝐦) on measuring CoT faithfulness by looking at internals, not just behaviorally
1/3
1️⃣ Purbid's 𝐩𝐨𝐬𝐭𝐞𝐫 at 𝐒𝐨𝐋𝐚𝐑 (𝟏𝟏:𝟏𝟓𝐚𝐦-𝟏:𝟎𝟎𝐩𝐦) on catching redundant preference pairs & how pruning them hurts accuracy; www.anamarasovic.com/publications...
2️⃣ My 𝐭𝐚𝐥𝐤 at 𝐗𝐋𝐋𝐌-𝐑𝐞𝐚𝐬𝐨𝐧-𝐏𝐥𝐚𝐧 (𝟏𝟐𝐩𝐦) on measuring CoT faithfulness by looking at internals, not just behaviorally
1/3
🚀 New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs🚀🧵
🚀 New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs🚀🧵
Check out the thread for the (many) other interesting works from his group 🎉
@mtutek.bsky.social , who couldn't travel:
bsky.app/profile/mtut...
Later that day we'll have a poster on predicting success of model editing by Yanay Soker, who also couldn't travel
Check out the thread for the (many) other interesting works from his group 🎉
Context Parametrization with Compositional Adapters
https://arxiv.org/abs/2509.22158
Context Parametrization with Compositional Adapters
https://arxiv.org/abs/2509.22158
ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs
https://arxiv.org/abs/2510.00857
ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs
https://arxiv.org/abs/2510.00857
* PhD applications direct or via ELLIS @ellis.eu (ellis.eu/news/ellis-p...)
* Post-doc applications direct or via Azrieli (azrielifoundation.org/fellows/inte...) or Zuckerman (zuckermanstem.org/ourprograms/...)
* PhD applications direct or via ELLIS @ellis.eu (ellis.eu/news/ellis-p...)
* Post-doc applications direct or via Azrieli (azrielifoundation.org/fellows/inte...) or Zuckerman (zuckermanstem.org/ourprograms/...)
We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!
We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!
📆 Application deadline: 30 October 2025
👀 Reasons to apply: www.copenlu.com/post/why-ucph/
🔗 Apply here: candidate.hr-manager.net/ApplicationI...
#NLProc #XAI #TrustworhyAI
📆 Application deadline: 30 October 2025
👀 Reasons to apply: www.copenlu.com/post/why-ucph/
🔗 Apply here: candidate.hr-manager.net/ApplicationI...
#NLProc #XAI #TrustworhyAI
🎉 Good news: I am hiring! 🎉
The position is part of the “Contested Climate Futures" project. 🌱🌍 You will focus on developing next-generation AI methods🤖 to analyze climate-related concepts in content—including texts, images, and videos.
Trained on 15T tokens in 1,000+ languages, it’s built for transparency, responsibility & the public good.
Read more: actu.epfl.ch/news/apertus...
Join us at #EACL2026 (Rabat, Morocco 🇲🇦, Mar 24-29 2026)
👉 Open to all areas of CL/NLP + related fields.
Details: 2026.eacl.org/calls/papers/
• ARR submission deadline: Oct 6, 2025
• EACL commitment deadline: Dec 14, 2025
Join us at #EACL2026 (Rabat, Morocco 🇲🇦, Mar 24-29 2026)
👉 Open to all areas of CL/NLP + related fields.
Details: 2026.eacl.org/calls/papers/
• ARR submission deadline: Oct 6, 2025
• EACL commitment deadline: Dec 14, 2025
- Open-topic PhD positions: express your interest through ELLIS by 31 October 2025, start in Autumn 2026: ellis.eu/news/ellis-p...
#NLProc #XAI
- Open-topic PhD positions: express your interest through ELLIS by 31 October 2025, start in Autumn 2026: ellis.eu/news/ellis-p...
#NLProc #XAI
www.theverge.com/anthropic/76...
From AAAI, which has ~29000 submissions: "There are 75,000+ unique submitting authors."
NeurIPS had 25000 submissions.
Is the number close to 300k? 500k?
From AAAI, which has ~29000 submissions: "There are 75,000+ unique submitting authors."
NeurIPS had 25000 submissions.
Is the number close to 300k? 500k?