André Panisson
panisson.bsky.social
André Panisson
@panisson.bsky.social

Principal Researcher @ CENTAI.eu | Leading the Responsible AI Team. Building Responsible AI through Explainable AI, Fairness, and Transparency. Researching Graph Machine Learning, Data Science, and Complex Systems to understand collective human behavior. .. more

Computer science 49%
Political science 11%
For Science Magazine, I wrote about "The Metaphors of Artificial Intelligence".

The way you conceptualize AI systems affects how you interact with them, do science on them, and create policy and apply laws to them.

Hope you will check it out!

www.science.org/doi/full/10....
The metaphors of artificial intelligence
A few months after ChatGPT was released, the neural network pioneer Terrence Sejnowski wrote about coming to grips with the shock of what large language models (LLMs) could do: “Something is beginning...
www.science.org

Anthropic dropped some insights into how AI brains work with their circuit tracing method. Turns out LLMs are bad at math because they’re eyeballing it (“36+59? Eh, 40ish+60ish=95?”). It means we’re one step closer to understanding the inner workings of LLMs.
#LLMs #AI #Interpretability
Tracing the thoughts of a large language model
Anthropic's latest interpretability research: a new microscope to understand Claude's internal mechanisms
www.anthropic.com

Reposted by André Panisson

Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey

Presents a framework categorizing MLLM explainability across data, model, and training perspectives to enhance transparency and trustworthiness.

📝 arxiv.org/abs/2412.02104
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
The rapid development of Artificial Intelligence (AI) has revolutionized numerous fields, with large language models (LLMs) and computer vision (CV) systems driving advancements in natural language un...
arxiv.org

Reposted by André Panisson

I am extremely honoured to receive the @ERC_Research
#ERCCoG award for #RUNES. For the next five years, I will be working on the mathematical, computational, and experimental (!!) sides to understand how higher-order interactions change how we think and coordinate.
a bald man with a beard is smiling in front of a group of people
ALT: a bald man with a beard is smiling in front of a group of people
media.tenor.com

The authors, as seen in the preprint recently published in Arxiv, include Neel Nanda from Google Deepmind, head of the mechanistic interpretability team
arxiv.org/abs/2411.14257
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Hallucinations in large language models are a widespread problem, yet the mechanisms behind whether models will hallucinate are poorly understood, limiting our ability to solve this problem. Using spa...
arxiv.org

Reposted by André Panisson

Really happy to share that our latest work on "Higher order connectomics of human brain function" is now out in NatComms @naturecomms.bsky.social. One of the most fun projects that I've worked on! Big thanks to the friends involved F Battiston, M Lucas, @lordgrilo.bsky.social, E Amico doi.org/nt34
Higher-order connectomics of human brain function reveals local topological signatures of task decoding, individual identification, and behavior - Nature Communications
Here, the authors perform a higher-order analysis of fMRI data, revealing that accounting for group interactions greatly enhances task decoding, brain fingerprinting, and brain-behavior associations c...
www.nature.com

Reposted by André Panisson

*Automatically Interpreting Millions of Features in LLMs*
by @norabelrose.bsky.social et al.

An open-source pipeline for finding interpretable features in LLMs with sparse autoencoders and automated explainability methods from @eleutherai.bsky.social.

arxiv.org/abs/2410.13928

Check out our poster at #LoG2024, based on our #TMLR paper:
📍 “A True-to-the-Model Axiomatic Benchmark for Graph-based Explainers”
🗓️ Tuesday 4–6 PM CET
📌 Poster Session 2, GatherTown
Join us to discuss graph ML explainability and benchmarks
#ExplainableAI #GraphML
openreview.net/forum?id=HSQTv3R8Iz

Reposted by André Panisson

NeurIPS Conference is now Live on Bluesky!

-NeurIPS2024 Communication Chairs

Reposted by André Panisson

🌟🤖📝 **Boosting human competences with interpretable and explainable artificial intelligence**

How can AI *boost* human decision-making instead of replacing it? We talk about this in our new paper.

doi.org/10.1037/dec0...

#AI #XAI #InterpretableAI #IAI #boosting #competences
🧵👇

Reposted by André Panisson

Even as an interpretable ML researcher, I wasn't sure what to make of Mechanistic Interpretability, which seemed to come out of nowhere not too long ago.

But then I found the paper "Mechanistic?" by
@nsaphra.bsky.social and @sarah-nlp.bsky.social, which clarified things.

You might like the work from @aliciacurth.bsky.social. Fantastic contributions to understanding this effect.

👋 I do research on xAI for Graph ML and am starting to explore Mechanistic Interpretability. I'd love to be added!
18M + 1.
💙, Mar🐫
bsky.app Bluesky @bsky.app · Nov 16
Another day, another million new people have joined Bluesky!

18M users? 🙂‍↔️ 18M friends 🙂‍↕️

Since LLMs are essentially artefacts of human knowledge, we can use them as a lens to study human biases and behaviour patterns. Exploring their learned representations could unlock new insights. Got ideas or want to collaborate on this? Let’s connect!

In "Do I Know This Entity?", Sparse autoencoders reveal how LLMs recognize entities they ‘know’—and how this self-knowledge impacts hallucinations. These insights could help steer models to refuse or hallucinate less. Fascinating work on interpretability of LLMs!
openreview.net/forum?id=WCR...
Do I Know This Entity? Knowledge Awareness and Hallucinations in...
Hallucinations in large language models are a widespread problem, yet the mechanisms behind whether models will hallucinate are poorly understood, limiting our ability to solve this problem. Using...
openreview.net

In Scaling and Evaluating Sparse Autoencoders, they extract 16M concepts (latents) from GPT-4 (guess the authors?).
They simplify tuning with k-sparse autoencoders and results show many improvements in explainability. Code, models (not all!) and visualizer included.
openreview.net/forum?id=tcs...
Scaling and evaluating sparse autoencoders
Sparse autoencoders provide a promising unsupervised approach for extracting interpretable features from a language model by reconstructing activations from a sparse bottleneck layer. Since...
openreview.net

ICLR is a top AI conference, and while the 2025 papers aren’t officially out yet, reviews are open. I’m diving into the highest rated in Interpretability and Explainable AI. Interestingly, the top ones focus on Mechanistic Interpretability, a promising topic that our team is starting to explore.

Bluesky feels like traveling back to the golden age of Twitter: when the follow button meant something, and your feed wasn’t a dystopian lineup of blue-tagged bots. It’s refreshing to be somewhere I don’t need an AI to explain why I’m seeing a post. Let’s hope we don’t ruin it this time!