Lightnews — Scholar-powered news

Ziling Cheng

@ziling-cheng.bsky.social

MSc Master's @mila-quebec.bsky.social @mcgill-nlp.bsky.social
Research Fellow @ RBC Borealis
Model analysis, interpretability, reasoning and hallucination
Studying model behaviours to make them better :))
Looking for Fall '26 PhD

Posts Replies Media Videos

Ziling Cheng

@ziling-cheng.bsky.social

🙏 Huge thanks to my collaborators @mengcao.bsky.social, Marc-Antoine Rondeau, and my advisor Jackie Cheung for their invaluable guidance and support throughout this work, and to friends at @mila-quebec.bsky.social and @mcgill-nlp.bsky.social 💙 7/n

June 6, 2025 at 6:12 PM

Ziling Cheng

@ziling-cheng.bsky.social

🧠 TL;DR: These irrelevant context hallucinations show that LLMs go beyond mere parroting 🦜 — they do generalize, based on contextual cues and abstract classes. But not reliably. They're more like chameleons 🦎 — blending with the context, even when they shouldn’t. 6/n

June 6, 2025 at 6:11 PM

Ziling Cheng

@ziling-cheng.bsky.social

🔍 What’s going on inside?
With mechanistic interpretability, we found:
- LLMs first compute abstract classes (like “language”) before narrowing to specific answers
- Competing circuits inside the model: one based on context, one based on query. Whichever is stronger wins. 5/n

June 6, 2025 at 6:11 PM

Ziling Cheng

@ziling-cheng.bsky.social

Sometimes this yields the right answer for the wrong reasoning (“Portuguese” from “Brazil”), other times, it produces confident errors (“Japanese” from “Honda”). 4/n

June 6, 2025 at 6:11 PM

Ziling Cheng

@ziling-cheng.bsky.social

Turns out, we can. They follow a systematic failure mode we call class-based (mis)generalization: the model abstracts the class from the query (e.g., languages) and generalizes based on features from the irrelevant context (e.g., Honda → Japan). 3/n

June 6, 2025 at 6:11 PM

Ziling Cheng

@ziling-cheng.bsky.social

These examples show answers — even to the same query — can shift under different irrelevant contexts. Can we predict these shifts? 2/n

June 6, 2025 at 6:10 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news