Senior researcher at Microsoft Research, PhD from UC Berkeley, https://csinva.io/
Drop me a message if you want to chat about interpretability/language neuroscience!
Drop me a message if you want to chat about interpretability/language neuroscience!
We tackle this issue in language neuroscience by using LLMs to generate *and validate* explanations with targeted follow-up experiments
We tackle this issue in language neuroscience by using LLMs to generate *and validate* explanations with targeted follow-up experiments
We show that "induction heads" found in LLMs can be reverse-engineered to yield accurate & interpretable next-word prediction models
We show that "induction heads" found in LLMs can be reverse-engineered to yield accurate & interpretable next-word prediction models