Keenan Samway
keenansamway.bsky.social
Keenan Samway
@keenansamway.bsky.social
RA @ MPI-IS | Previously Machine Learning @UCL and STIA @Georgetown | Interested in NLP, interpretability, model editing, unlearning, reasoning, safety.
Reposted by Keenan Samway
Do you know what rating you’ll give after reading the intro? Are your confidence scores 4 or higher? Do you not respond in rebuttal phases? Are you worried how it will look if your rating is the only 8 among 3’s? This thread is for you.
November 27, 2024 at 5:25 PM
Reposted by Keenan Samway
It is promising to use natural language latent reasoning to interpret LLMs. But we need confidence that the latent reasoning is interpretable and faithful, such that human oversight of the reasoning trace counts as meaningful supervision.

Here are 2 reasons this may be hard. 🧵
November 24, 2024 at 7:00 PM
Reposted by Keenan Samway
How do LLMs learn to reason from data? Are they ~retrieving the answers from parametric knowledge🦜? In our new preprint, we look at the pretraining data and find evidence against this:

Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢

🧵⬇️
November 20, 2024 at 4:35 PM
Reposted by Keenan Samway
Even as an interpretable ML researcher, I wasn't sure what to make of Mechanistic Interpretability, which seemed to come out of nowhere not too long ago.

But then I found the paper "Mechanistic?" by
@nsaphra.bsky.social and @sarah-nlp.bsky.social, which clarified things.
November 20, 2024 at 8:00 AM