Lightnews — Scholar-powered news

Itay Itzhak @ COLM 🍁

@itay-itzhak.bsky.social

NLProc, deep learning, and machine learning. Ph.D. student @ Technion and The Hebrew University.
https://itay1itzhak.github.io/

Posts Replies Media Videos

Itay Itzhak @ COLM 🍁

@itay-itzhak.bsky.social

@boknilev.bsky.social @gabistanovsky.bsky.social

July 15, 2025 at 7:08 PM

Itay Itzhak @ COLM 🍁

@itay-itzhak.bsky.social

Huge thanks to my co-authors
@boknilev @GabiStanovsky!
Preprint: arxiv.org/abs/2507.07186
Webpage: itay1itzhak.github.io/planted-in-...
We’d love your thoughts, critiques, and ideas 📬
Let’s talk about building more interpretable and trustworthy LLMs!
#NLProc #Bias #CognitiveAI

Planted in Pretraining, Swayed by Finetuning: A Case Study on the...

Large language models (LLMs) exhibit cognitive biases -- systematic tendencies of irrational decision-making, similar to those seen in humans. Prior work has found that these biases vary across...

arxiv.org

July 15, 2025 at 1:38 PM

Itay Itzhak @ COLM 🍁

@itay-itzhak.bsky.social

🧠 Takeaway:
Cognitive biases are not introduced during instruction tuning.
They’re planted in pretraining and only surfaced by finetuning.
If we want fairer models, we need to look deeper into the pretraining pipeline.

July 15, 2025 at 1:38 PM

Itay Itzhak @ COLM 🍁

@itay-itzhak.bsky.social

🔄 Step 2: Cross-tuning.
We swap instruction datasets between models with different pretraining.
Result: Biases follow the pretrained model!

PCA clearly shows models group by pretraining base, not by instruction.
The bias “signature” stays intact, no matter the finetuning!

July 15, 2025 at 1:38 PM

Itay Itzhak @ COLM 🍁

@itay-itzhak.bsky.social

🎲 Step 1: Training randomness.
We finetune the same model 3× with different seeds.
Result: Some variation in bias scores, but behavior patterns stay stable compared to MMLU variance.
✅ Aggregating across seeds reveals consistent trends.

July 15, 2025 at 1:38 PM

Itay Itzhak @ COLM 🍁

@itay-itzhak.bsky.social

🧪 We introduce a two-step causal framework to disentangle the effects of:
- Pretraining
- Instruction tuning
- Training randomness

- 🍁 Bottom line - pretraining is the origin of bias. Finetuning? Just the messenger
#CausalInference #TrustworthyAI #NLP

July 15, 2025 at 1:38 PM

Itay Itzhak @ COLM 🍁

@itay-itzhak.bsky.social

Super interesting! Have you tested how LAP handles more diverse paraphrasing? For example, do you think it would also work for code functions with similar roles?

March 5, 2025 at 3:52 PM

Itay Itzhak @ COLM 🍁

@itay-itzhak.bsky.social

Why not try the straightforward approach: label high-quality texts and train an LM to classify them? Of course this should be done separately for different types of texts - a great scientific paper ≠ a great novel.
(Similar to how Llama 3 pretraining used quality scores from Llama 2 and Roberta)

December 10, 2024 at 9:32 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news