Itay Itzhak @ COLM 🍁
itay-itzhak.bsky.social
Itay Itzhak @ COLM 🍁
@itay-itzhak.bsky.social
NLProc, deep learning, and machine learning. Ph.D. student @ Technion and The Hebrew University.
https://itay1itzhak.github.io/
Had a blast at CoLM! It really was as good as everyone says, congrats to the organizers 🎉
This week I’ll be in New York giving talks at NYU, Yale, and Cornell Tech.
If you’re around and want to chat about LLM behavior, safety, interpretability, or just say hi - DM me!
October 13, 2025 at 4:19 PM
In Vienna for #ACL2025, and already had my first (vegan) Austrian sausage!

Now hungry for discussing:
– LLMs behavior
– Interpretability
– Biases & Hallucinations
– Why eval is so hard (but so fun)
Come say hi if that’s your vibe too!
July 27, 2025 at 6:11 AM
🔄 Step 2: Cross-tuning.
We swap instruction datasets between models with different pretraining.
Result: Biases follow the pretrained model!

PCA clearly shows models group by pretraining base, not by instruction.
The bias “signature” stays intact, no matter the finetuning!
July 15, 2025 at 1:38 PM
🎲 Step 1: Training randomness.
We finetune the same model 3× with different seeds.
Result: Some variation in bias scores, but behavior patterns stay stable compared to MMLU variance.
✅ Aggregating across seeds reveals consistent trends.
July 15, 2025 at 1:38 PM
🧪 We introduce a two-step causal framework to disentangle the effects of:
- Pretraining
- Instruction tuning
- Training randomness

- 🍁 Bottom line - pretraining is the origin of bias. Finetuning? Just the messenger
#CausalInference #TrustworthyAI #NLP
July 15, 2025 at 1:38 PM
🚨New paper alert🚨

🧠
Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretraining ghosts resurfacing?

Excited to share our new paper, accepted to CoLM 2025🎉!
See thread below 👇
#BiasInAI #LLMs #MachineLearning #NLProc
July 15, 2025 at 1:38 PM