Lightnews — Scholar-powered news

Keenan Samway

@keenansamway.bsky.social

RA @ MPI-IS | Previously Machine Learning @UCL and STIA @Georgetown | Interested in NLP, interpretability, model editing, unlearning, reasoning, safety.

Posts Replies Media Videos

Reposted by Keenan Samway

Laura

@lauraruis.bsky.social

Do you know what rating you’ll give after reading the intro? Are your confidence scores 4 or higher? Do you not respond in rebuttal phases? Are you worried how it will look if your rating is the only 8 among 3’s? This thread is for you.

November 27, 2024 at 5:25 PM

Reposted by Keenan Samway

Geoffrey Irving

@girving.bsky.social

It is promising to use natural language latent reasoning to interpret LLMs. But we need confidence that the latent reasoning is interpretable and faithful, such that human oversight of the reasoning trace counts as meaningful supervision.

Here are 2 reasons this may be hard. 🧵

November 24, 2024 at 7:00 PM

Reposted by Keenan Samway

Laura

@lauraruis.bsky.social

How do LLMs learn to reason from data? Are they ~retrieving the answers from parametric knowledge🦜? In our new preprint, we look at the pretraining data and find evidence against this:

Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢

🧵⬇️

November 20, 2024 at 4:35 PM

Reposted by Keenan Samway

Christoph Molnar

@christophmolnar.bsky.social

Even as an interpretable ML researcher, I wasn't sure what to make of Mechanistic Interpretability, which seemed to come out of nowhere not too long ago.

But then I found the paper "Mechanistic?" by
@nsaphra.bsky.social and @sarah-nlp.bsky.social, which clarified things.

November 20, 2024 at 8:00 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news