Itay Itzhak @ COLM 🍁
itay-itzhak.bsky.social
Itay Itzhak @ COLM 🍁
@itay-itzhak.bsky.social
NLProc, deep learning, and machine learning. Ph.D. student @ Technion and The Hebrew University.
https://itay1itzhak.github.io/
Reposted by Itay Itzhak @ COLM 🍁
Introducing Global PIQA, a new multilingual benchmark for 100+ languages. This benchmark is the outcome of this year’s MRL shared task, in collaboration with 300+ researchers from 65 countries. This dataset evaluates physical commonsense reasoning in culturally relevant contexts.
October 29, 2025 at 3:50 PM
Had a blast at CoLM! It really was as good as everyone says, congrats to the organizers 🎉
This week I’ll be in New York giving talks at NYU, Yale, and Cornell Tech.
If you’re around and want to chat about LLM behavior, safety, interpretability, or just say hi - DM me!
October 13, 2025 at 4:19 PM
Thrilled to be part of this work led by
@adisimhi.bsky.social !

ManagerBench reveals a critical problem:
✅ LLMs can recognize harm
❌ But often choose it anyway to meet goals
🤖 Or overcorrect and become ineffective
We need better balance!

A must-read for safety folks!
🤔What happens when LLM agents choose between achieving their goals and avoiding harm to humans in realistic management scenarios? Are LLMs pragmatic or prefer to avoid human harm?

🚀 New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs🚀🧵
October 8, 2025 at 3:22 PM
Reposted by Itay Itzhak @ COLM 🍁
Traveling to #COLM2025 this week, and here's some work from our group and collaborators:
Cognitive biases, hidden knowledge, CoT faithfulness, model editing, and LM4Science
See the thread for details and reach out if you'd like to discuss more!
October 7, 2025 at 1:41 PM
At #ACL2025 and not sure what to do next? GEM 💎² is the place to be for awesome talks on the future of LLM evaluation. Come hear @GabiStanovsky, @EliyaHabba, @LChoshen and others rethink what it means to actually evaluate LLMs beyond accuracy and vibes. Thursday @ Hall C!
July 30, 2025 at 7:04 PM
In Vienna for #ACL2025, and already had my first (vegan) Austrian sausage!

Now hungry for discussing:
– LLMs behavior
– Interpretability
– Biases & Hallucinations
– Why eval is so hard (but so fun)
Come say hi if that’s your vibe too!
July 27, 2025 at 6:11 AM
🚨New paper alert🚨

🧠
Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretraining ghosts resurfacing?

Excited to share our new paper, accepted to CoLM 2025🎉!
See thread below 👇
#BiasInAI #LLMs #MachineLearning #NLProc
July 15, 2025 at 1:38 PM
Reposted by Itay Itzhak @ COLM 🍁
Excited to share our paper: "Chain-of-Thought Is Not Explainability"! We unpack a critical misconception in AI: models explaining their steps (CoT) aren't necessarily revealing their true reasoning. Spoiler: the transparency can be an illusion. (1/9) 🧵
July 1, 2025 at 3:41 PM
Reposted by Itay Itzhak @ COLM 🍁
Are you recovering from your @colmweb.org abstract submission? GEM has a non-archival track that allows you to submit a two-page abstract in parallel?

Our workshop deadline is soon, please consider submitting your evaluation paper!

You can find our call for papers at gem-benchmark.com/workshop
March 24, 2025 at 3:36 PM
New paper alert!

Curious how small prompt tweaks impact LLM accuracy but don’t want to run endless inferences? We got you. Meet DOVE - a dataset built to uncover these sensitivities.

Use DOVE for your analysis or contribute samples -we're growing and welcome you aboard!
Care about LLM evaluation? 🤖 🤔

We bring you ️️🕊️ DOVE a massive (250M!) collection of LLMs outputs 
On different prompts, domains, tokens, models...

Join our community effort to expand it with YOUR model predictions & become a co-author!
March 17, 2025 at 4:33 PM
Reposted by Itay Itzhak @ COLM 🍁
1/13 LLM circuits tell us where the computation happens inside the model—but the computation varies by token position, a key detail often ignored!
We propose a method to automatically find position-aware circuits, improving faithfulness while keeping circuits compact. 🧵👇
March 6, 2025 at 10:15 PM
Reposted by Itay Itzhak @ COLM 🍁
🚨🚨 New preprint 🚨🚨

Ever wonder whether verbalized CoTs correspond to the internal reasoning process of the model?

We propose a novel parametric faithfulness approach, which erases information contained in CoT steps from the model parameters to assess CoT faithfulness.

arxiv.org/abs/2502.14829
Measuring Faithfulness of Chains of Thought by Unlearning Reasoning Steps
When prompted to think step-by-step, language models (LMs) produce a chain of thought (CoT), a sequence of reasoning steps that the model supposedly used to produce its prediction. However, despite mu...
arxiv.org
February 21, 2025 at 12:43 PM
We usually blame hallucinations on uncertainty or missing knowledge. But what if I told you that LLMs hallucinate even when they *know* the correct answer - and they do it with *high certainty* 🤯?
Check out our new paper that challenges assumptions on AI trustworthiness! 🧵👇
🚨New arXiv preprint!🚨
LLMs can hallucinate - but did you know they can do so with high certainty even when they know the correct answer? 🤯
We find those hallucinations in our latest work with @itay-itzhak.bsky.social, @fbarez.bsky.social, @gabistanovsky.bsky.social and Yonatan Belinkov
February 19, 2025 at 3:55 PM
Reposted by Itay Itzhak @ COLM 🍁
GEM is so back! Our workshop for Generation, Evaluation, and Metrics is coming to an ACL near you.

Evaluation in the world of GenAI is more important than ever, so please consider submitting your amazing work.

CfP can be found at gem-benchmark.com/workshop
February 12, 2025 at 2:25 PM