nsaphra.net
arxiv.org/abs/2602.150...
With Dhruva Karkada, Daniel Korchinski, Andres Nava, & Matthieu Wyart.
arxiv.org/abs/2602.150...
With Dhruva Karkada, Daniel Korchinski, Andres Nava, & Matthieu Wyart.
Our goal is to develop theory for modern machine learning systems that can help us understand complex network behaviors, including those critical for AI safety and alignment.
1
Our goal is to develop theory for modern machine learning systems that can help us understand complex network behaviors, including those critical for AI safety and alignment.
1
So the bot simulated a tantrum.
So the bot simulated a tantrum.
Look from left to right below: TwoNN is perfect, empirical Fisher is too sensitive, weight norm is not sensitive enough.
Look from left to right below: TwoNN is perfect, empirical Fisher is too sensitive, weight norm is not sensitive enough.
But they cannot spend 100 pages making you think Wickham is the charming love interest while inserting deniable clues that will—only in retrospect!—reveal you should have known he’s a cad.
They’re not trained to mislead.+
But they cannot spend 100 pages making you think Wickham is the charming love interest while inserting deniable clues that will—only in retrospect!—reveal you should have known he’s a cad.
They’re not trained to mislead.+
Lecture videos, psets, and readings are all provided.
Had a lot of fun teaching this with @sarameghanbeery.bsky.social and @jeremybernste.in!
Lecture videos, psets, and readings are all provided.
Had a lot of fun teaching this with @sarameghanbeery.bsky.social and @jeremybernste.in!
@nsaphra.bsky.social! We aim to predict potential AI model failures before impact--before deployment, using interpretability.
@nsaphra.bsky.social! We aim to predict potential AI model failures before impact--before deployment, using interpretability.
Are visual tokens going into an LLM interpretable 🤔
Existing methods (e.g. logit lens) and assumptions would lead you to think “not much”...
We propose LatentLens and show that most visual tokens are interpretable across *all* layers 💡
Details 🧵
Are visual tokens going into an LLM interpretable 🤔
Existing methods (e.g. logit lens) and assumptions would lead you to think “not much”...
We propose LatentLens and show that most visual tokens are interpretable across *all* layers 💡
Details 🧵
www.nature.com/articles/s41...
We develop a geometric theory of how neural populations support generalization across many tasks.
@zuckermanbrain.bsky.social
@flatironinstitute.org
@kempnerinstitute.bsky.social
1/14
www.nature.com/articles/s41...
We develop a geometric theory of how neural populations support generalization across many tasks.
@zuckermanbrain.bsky.social
@flatironinstitute.org
@kempnerinstitute.bsky.social
1/14
Why do identical neural network models develop separate internal approaches to solve the same problem?
@annhuang42.bsky.social explores the factors driving variability in task-trained networks in our latest @kempnerinstitute.bsky.social Deeper Learning blog.
Why do identical neural network models develop separate internal approaches to solve the same problem?
@annhuang42.bsky.social explores the factors driving variability in task-trained networks in our latest @kempnerinstitute.bsky.social Deeper Learning blog.
www.pnas.org/doi/epdf/10....
www.pnas.org/doi/epdf/10....
In the last few weeks, I've mocked up class demos of a live turing test, generated cross-references for an encyclopedia, and prototyped new tablet tasks for developmental psych.
It's wild.
In the last few weeks, I've mocked up class demos of a live turing test, generated cross-references for an encyclopedia, and prototyped new tablet tasks for developmental psych.
It's wild.
bit.ly/4qeXAg1 #AI #ML #LLMs
bit.ly/4qeXAg1 #AI #ML #LLMs
congrats to our lead @akariasai.bsky.social & team of students and Ai2 researchers/engineers
www.nature.com/articles/s41...
congrats to our lead @akariasai.bsky.social & team of students and Ai2 researchers/engineers
www.nature.com/articles/s41...
Along these paths, a larger network behaves like a smaller one, retaining the same simplicity during a saddle-to-saddle transition.
Along these paths, a larger network behaves like a smaller one, retaining the same simplicity during a saddle-to-saddle transition.