Maxime Peyrard
peyrardmax.bsky.social
Maxime Peyrard
@peyrardmax.bsky.social
Junior Professor CNRS (previously EPFL, TU Darmstadt) -- AI Interpretability, causal machine learning, and NLP. Currently visiting @NYU

https://peyrardm.github.io
What can be done?

👉 Stricter validity criteria?
👉 Maybe interpretability is inherently underdetermined? and we can only get control and predictability but not "understanding"

This is a fascinating topic, and we keep investigating. If you're interested, come and chat at ICLR!
April 21, 2025 at 1:52 PM
We find a lot of identifiability issues:
- Multiple explanatory algorithms exists
- Even for one algorithm, there are many localizations in the network

Identifiability problems remain across scenarios: changing levels of over-parametrization, progress in training, multi-tasks, model size.
April 21, 2025 at 1:52 PM
In our work, we stress-test the identifiability of research programs of MI with small MLPs and simple boolean logic tasks.
Why? It allows us to enumerate all possible explanations and see how many pass various MI testing criteria.
April 21, 2025 at 1:52 PM
This brings us to identifiability. In statistics a property is identifiable if a unique value is compatible with the data. Identifiability matters because it is a prerequisite for doing statistical and causal inference.

Interpretability is also an exercise in causal inference!
April 21, 2025 at 1:52 PM
Mechanistic Interpretability aims to produce statements like: "Model M solves task T by doing X."
To do so, many causal manipulations are performed to validate an explanation. But what if (many) other, incompatible explanations also pass the causal tests?
April 21, 2025 at 1:52 PM
Hey, thanks for making it, can you also add me
November 24, 2024 at 12:21 AM
Thanks for creating the pack, I am also working on this topic :)
November 23, 2024 at 4:59 PM