Lightnews — Scholar-powered news

Nicolas Zucchet

@nzucchet.bsky.social

PhD Student @ ETH Zurich
Previously: Student Researcher @ Google DeepMind, @École polytechnique
https://nicolaszucchet.github.io

Posts Replies Media Videos

Nicolas Zucchet

@nzucchet.bsky.social

Hallucinations emerge with knowledge. As models learn facts about seen individuals, they also make overconfident predictions about unseen ones.
On top of that, fine-tuning struggles to add new knowledge: existing memories are quickly corrupted when learning new ones.

April 3, 2025 at 12:21 PM

Nicolas Zucchet

@nzucchet.bsky.social

The training data distribution has a massive impact on learning. Imbalanced distributions (some individuals appearing more frequently) accelerate the plateau phase.
This suggests exciting new data scheduling strategies for training - we show that a simple warmup works well!

April 3, 2025 at 12:21 PM

Nicolas Zucchet

@nzucchet.bsky.social

During that plateau, something crucial happens: the model builds the attention-based circuits that enable recall.
This is when the model learns how to recall facts, and it only remembers specific facts afterward!

April 3, 2025 at 12:21 PM

Nicolas Zucchet

@nzucchet.bsky.social

We studied how models learn on a synthetic biography task and found three key phases in knowledge acquisition:
1. Models initially learn generic statistics
2. Performance plateaus while attention-based circuits form
3. Knowledge emerges as models learn individual-specific facts

April 3, 2025 at 12:21 PM

Nicolas Zucchet

@nzucchet.bsky.social

Large language models store vast amounts of knowledge, but how exactly do they learn it?

Excited to share my Google DeepMind internship results, which reveal the fascinating dynamics behind factual knowledge acquisition in LLMs!

April 3, 2025 at 12:21 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news