Research Fellow @ RBC Borealis
Model analysis, interpretability, reasoning and hallucination
Studying model behaviours to make them better :))
Looking for Fall '26 PhD
With mechanistic interpretability, we found:
- LLMs first compute abstract classes (like “language”) before narrowing to specific answers
- Competing circuits inside the model: one based on context, one based on query. Whichever is stronger wins. 5/n
With mechanistic interpretability, we found:
- LLMs first compute abstract classes (like “language”) before narrowing to specific answers
- Competing circuits inside the model: one based on context, one based on query. Whichever is stronger wins. 5/n