Ziling Cheng
banner
ziling-cheng.bsky.social
Ziling Cheng
@ziling-cheng.bsky.social
MSc Master's @mila-quebec.bsky.social @mcgill-nlp.bsky.social
Research Fellow @ RBC Borealis
Model analysis, interpretability, reasoning and hallucination
Studying model behaviours to make them better :))
Looking for Fall '26 PhD
🙏 Huge thanks to my collaborators @mengcao.bsky.social, Marc-Antoine Rondeau, and my advisor Jackie Cheung for their invaluable guidance and support throughout this work, and to friends at @mila-quebec.bsky.social and @mcgill-nlp.bsky.social 💙 7/n
June 6, 2025 at 6:12 PM
🧠 TL;DR: These irrelevant context hallucinations show that LLMs go beyond mere parroting 🦜 — they do generalize, based on contextual cues and abstract classes. But not reliably. They're more like chameleons 🦎 — blending with the context, even when they shouldn’t. 6/n
June 6, 2025 at 6:11 PM
🔍 What’s going on inside?
With mechanistic interpretability, we found:
- LLMs first compute abstract classes (like “language”) before narrowing to specific answers
- Competing circuits inside the model: one based on context, one based on query. Whichever is stronger wins. 5/n
June 6, 2025 at 6:11 PM
Sometimes this yields the right answer for the wrong reasoning (“Portuguese” from “Brazil”), other times, it produces confident errors (“Japanese” from “Honda”). 4/n
June 6, 2025 at 6:11 PM
Turns out, we can. They follow a systematic failure mode we call class-based (mis)generalization: the model abstracts the class from the query (e.g., languages) and generalizes based on features from the irrelevant context (e.g., Honda → Japan). 3/n
June 6, 2025 at 6:11 PM
These examples show answers — even to the same query — can shift under different irrelevant contexts. Can we predict these shifts? 2/n
June 6, 2025 at 6:10 PM