Martin Tutek
mtutek.bsky.social
Martin Tutek
@mtutek.bsky.social
Postdoc @ TakeLab, UniZG | previously: Technion; TU Darmstadt | PhD @ TakeLab, UniZG

Faithful explainability, controllability & safety of LLMs.

🔎 On the academic job market 🔎

https://mttk.github.io/
Pinned
🚨🚨 New preprint 🚨🚨

Ever wonder whether verbalized CoTs correspond to the internal reasoning process of the model?

We propose a novel parametric faithfulness approach, which erases information contained in CoT steps from the model parameters to assess CoT faithfulness.

arxiv.org/abs/2502.14829
Measuring Faithfulness of Chains of Thought by Unlearning Reasoning Steps
When prompted to think step-by-step, language models (LMs) produce a chain of thought (CoT), a sequence of reasoning steps that the model supposedly used to produce its prediction. However, despite mu...
arxiv.org
*Urgently* looking for emergency reviewers for the ARR October Interpretability track 🙏🙏

ReSkies much appreciated
November 11, 2025 at 10:29 AM
Reposted by Martin Tutek
Full house at BlackboxNLP at #EMNLP2025!! Getting ready for my 1.45PM keynote 😎 Join us in A102 to learn about "Memorization: myth or mystery?"
November 9, 2025 at 3:05 AM
Reposted by Martin Tutek
𝙒𝙚'𝙧𝙚 𝙝𝙞𝙧𝙞𝙣𝙜 𝙣𝙚𝙬 𝙛𝙖𝙘𝙪𝙡𝙩𝙮 𝙢𝙚𝙢𝙗𝙚𝙧𝙨!

KSoC: utah.peopleadmin.com/postings/190... (AI broadly)

Education + AI:
- utah.peopleadmin.com/postings/189...
- utah.peopleadmin.com/postings/190...

Computer Vision:
- utah.peopleadmin.com/postings/183...
November 7, 2025 at 11:35 PM
Reposted by Martin Tutek
Outstanding paper (5/7):

"Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps"
by Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovic, and Yonatan Belinkov
aclanthology.org/2025.emnlp-m...

6/n
Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps
Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovic, Yonatan Belinkov. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.
aclanthology.org
November 7, 2025 at 10:32 PM
Very honored to be one out of seven outstanding papers at this years' EMNLP :)

Huge thanks to my amazing collaborators @fatemehc.bsky.social @anamarasovic.bsky.social @boknilev.bsky.social , this would not have been possible without them!
November 7, 2025 at 8:58 AM
Reposted by Martin Tutek
Presenting today our work "Unsupervised Word-level Quality Estimation Through the Lens of Annotator (Dis)agreement" at the Machine Translation morning session (Room A301, 11:45 China time). See you there! 🤗

Paper: aclanthology.org/2025.emnlp-m...
Slides/video/poster: underline.io/events/502/s...
Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement
Gabriele Sarti, Vilém Zouhar, Malvina Nissim, Arianna Bisazza. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.
aclanthology.org
November 6, 2025 at 1:19 AM
Reposted by Martin Tutek
Here’s a custom feed for #EMNLP2025. Click the pin to save it to your home screen!
November 2, 2025 at 3:15 PM
Flying out to @emnlpmeeting soon🇨🇳
I'll present our parametric CoT faithfulness work (arxiv.org/abs/2502.14829) on Wednesday at the second Interpretability session, 16:30-18:00 local time A104-105

If you're in Suzhou, reach out to talk all things reasoning :)
Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps
When prompted to think step-by-step, language models (LMs) produce a chain of thought (CoT), a sequence of reasoning steps that the model supposedly used to produce its prediction. Despite much work o...
arxiv.org
October 31, 2025 at 1:30 PM
Reposted by Martin Tutek
⏰ One week left to apply for the two PhD Fellowships in Trustworthy NLP and Explainable NLU! The two positions have a starting date in spring 2026. Check the original post for more details👇
Available #NLProc PhD positions:
- Explainable NLU, main supervisor: myself, start in Spring 2026 tinyurl.com/3uset3dm
- Trustworthy NLP, main supervisor: @apepa.bsky.social, start in Spring 2026 tinyurl.com/yxj8yk4m
- Open-topic: express interest via ELLIS, start in Autumn 2026 tinyurl.com/2hcxexyx
LinkedIn
This link will take you to a page that’s not on LinkedIn
lnkd.in
October 24, 2025 at 8:30 AM
Reposted by Martin Tutek
📣Tomorrow at #COLM2025:

1️⃣ Purbid's 𝐩𝐨𝐬𝐭𝐞𝐫 at 𝐒𝐨𝐋𝐚𝐑 (𝟏𝟏:𝟏𝟓𝐚𝐦-𝟏:𝟎𝟎𝐩𝐦) on catching redundant preference pairs & how pruning them hurts accuracy; www.anamarasovic.com/publications...

2️⃣ My 𝐭𝐚𝐥𝐤 at 𝐗𝐋𝐋𝐌-𝐑𝐞𝐚𝐬𝐨𝐧-𝐏𝐥𝐚𝐧 (𝟏𝟐𝐩𝐦) on measuring CoT faithfulness by looking at internals, not just behaviorally

1/3
October 9, 2025 at 4:54 PM
If you're at COLM, check out various works by Ana and her group!
📣Tomorrow at #COLM2025:

1️⃣ Purbid's 𝐩𝐨𝐬𝐭𝐞𝐫 at 𝐒𝐨𝐋𝐚𝐑 (𝟏𝟏:𝟏𝟓𝐚𝐦-𝟏:𝟎𝟎𝐩𝐦) on catching redundant preference pairs & how pruning them hurts accuracy; www.anamarasovic.com/publications...

2️⃣ My 𝐭𝐚𝐥𝐤 at 𝐗𝐋𝐋𝐌-𝐑𝐞𝐚𝐬𝐨𝐧-𝐏𝐥𝐚𝐧 (𝟏𝟐𝐩𝐦) on measuring CoT faithfulness by looking at internals, not just behaviorally

1/3
October 9, 2025 at 4:58 PM
🤔What happens when LLM agents choose between achieving their goals and avoiding harm to humans in realistic management scenarios? Are LLMs pragmatic or prefer to avoid human harm?

🚀 New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs🚀🧵
October 8, 2025 at 3:14 PM
I won't be at COLM, so come see Yonatan talk about our work on estimating CoT faithfulness using machine unlearning!

Check out the thread for the (many) other interesting works from his group 🎉
In #Interplay25 workshop, Friday ~11:30, I'll present on measuring *parametric* CoT faithfulness on behalf of
@mtutek.bsky.social , who couldn't travel:
bsky.app/profile/mtut...

Later that day we'll have a poster on predicting success of model editing by Yanay Soker, who also couldn't travel
October 7, 2025 at 1:47 PM
Reposted by Martin Tutek
Here’s a #COLM2025 feed!

Pin it 📌 to follow along with the conference this week!
October 6, 2025 at 8:26 PM
Reposted by Martin Tutek
Josip Juki\'c, Martin Tutek, Jan \v{S}najder
Context Parametrization with Compositional Adapters
https://arxiv.org/abs/2509.22158
September 29, 2025 at 7:47 AM
Reposted by Martin Tutek
Adi Simhi, Jonathan Herzig, Martin Tutek, Itay Itzhak, Idan Szpektor, Yonatan Belinkov
ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs
https://arxiv.org/abs/2510.00857
October 2, 2025 at 6:59 AM
Reposted by Martin Tutek
Opportunities to join my group in fall 2026:
* PhD applications direct or via ELLIS @ellis.eu (ellis.eu/news/ellis-p...)
* Post-doc applications direct or via Azrieli (azrielifoundation.org/fellows/inte...) or Zuckerman (zuckermanstem.org/ourprograms/...)
October 1, 2025 at 1:44 PM
Reposted by Martin Tutek
What's the right unit of analysis for understanding LLM internals? We explore in our mech interp survey (a major update from our 2024 ms).

We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!
October 1, 2025 at 2:03 PM
Reposted by Martin Tutek
🎓 Fully funded PhD in Trustworthy NLP at the UCPH & @aicentre.dk with @iaugenstein.bsky.social and me, @copenlu.bsky.social
📆 Application deadline: 30 October 2025
👀 Reasons to apply: www.copenlu.com/post/why-ucph/
🔗 Apply here: candidate.hr-manager.net/ApplicationI...
#NLProc #XAI #TrustworhyAI
September 29, 2025 at 12:01 PM
Reposted by Martin Tutek
🚨 Are you looking for a PhD in #NLProc dealing with #LLMs?
🎉 Good news: I am hiring! 🎉
The position is part of the “Contested Climate Futures" project. 🌱🌍 You will focus on developing next-generation AI methods🤖 to analyze climate-related concepts in content—including texts, images, and videos.
September 24, 2025 at 7:34 AM
Reposted by Martin Tutek
The next generation of open LLMs should be inclusive, compliant, and multilingual by design. That’s why we @icepfl.bsky.social @ethz.ch @cscsch.bsky.social ) built Apertus.
EPFL, ETH Zurich & CSCS just released Apertus, Switzerland’s first fully open-source large language model.
Trained on 15T tokens in 1,000+ languages, it’s built for transparency, responsibility & the public good.

Read more: actu.epfl.ch/news/apertus...
September 3, 2025 at 9:26 AM
Reposted by Martin Tutek
🚨 EACL 2026 website is live and Call for Papers is out! 🚨

Join us at #EACL2026 (Rabat, Morocco 🇲🇦, Mar 24-29 2026)

👉 Open to all areas of CL/NLP + related fields.

Details: 2026.eacl.org/calls/papers/

• ARR submission deadline: Oct 6, 2025
• EACL commitment deadline: Dec 14, 2025
September 2, 2025 at 8:45 AM
Reposted by Martin Tutek
- Fully funded PhD fellowship on Explainable NLU: apply by 31 October 2025, start in Spring 2026: candidate.hr-manager.net/ApplicationI...

- Open-topic PhD positions: express your interest through ELLIS by 31 October 2025, start in Autumn 2026: ellis.eu/news/ellis-p...

#NLProc #XAI
PhD fellowship in Explainable Natural Language Understanding Department of Computer Science Faculty of SCIENCE University of Copenhagen
The Natural Language Processing Section at the Department of Computer Science, Faculty of Science at the University of Copenhagen invites applicants for a PhD f
candidate.hr-manager.net
September 1, 2025 at 2:20 PM
Reposted by Martin Tutek
All your embarrassing secrets are training data (unless you are paying attention)
NEW: Anthropic will start training its AI models on user data, including new chat transcripts & coding sessions, unless users choose to opt out by 9/28 (it's a pop-up window that will give you the choice). It’s also extending its data retention to 5 years.
www.theverge.com/anthropic/76...
Anthropic will start training its AI models on chat transcripts
You can choose to opt out.
www.theverge.com
August 28, 2025 at 4:42 PM
How many people would you estimate are currently actively publishing in ML research?

From AAAI, which has ~29000 submissions: "There are 75,000+ unique submitting authors."
NeurIPS had 25000 submissions.

Is the number close to 300k? 500k?
August 27, 2025 at 7:32 PM