Laura Kopf
lkopf.bsky.social
Laura Kopf
@lkopf.bsky.social
PhD student in Interpretable Machine Learning at @tuberlin.bsky.social & @bifold.berlin

https://web.ml.tu-berlin.de/author/laura-kopf/
Happy to share that our PRISM paper has been accepted at #NeurIPS2025 🎉

In this work, we introduce a multi-concept feature description framework that can identify and score polysemantic features.

📄 Paper: arxiv.org/abs/2506.15538

#NeurIPS #MechInterp #XAI
September 19, 2025 at 12:02 PM
Our results highlight that the PRISM framework not only provides multiple human interpretable descriptions for neurons but also aligns with the human interpretation of polysemanticity. (5/7)
June 19, 2025 at 3:18 PM
In exploring the concept space, we use PRISM to characterize more complex components, finding and interpreting patterns that specific attention heads or groups of neurons respond to. (4/7)
June 19, 2025 at 3:18 PM
We benchmark PRISM across layers and architectures, showing how polysemanticity and interpretability shift through the model. (3/7)
June 19, 2025 at 3:18 PM
PRISM samples sentences from the top percentile activation distribution, clusters them in embedding space, and uses an LLM to generate labels for each concept cluster. (2/7)
June 19, 2025 at 3:18 PM
🔍 When do neurons encode multiple concepts?

We introduce PRISM, a framework for extracting multi-concept feature descriptions to better understand polysemanticity.

📄 Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework
arxiv.org/abs/2506.15538

🧵 (1/7)
June 19, 2025 at 3:18 PM
Still overwhelmed by the amazing response to our poster session at @neuripsconf.bsky.social with Anna Hedström and Marina Höhne! It was incredible to have such lively and inspiring discussions with brilliant people whose work I admire. ✨
December 13, 2024 at 2:48 AM
I’ll be presenting our work at @neuripsconf.bsky.social in Vancouver! 🎉
Join me this Thursday, December 12th, in East Exhibit Hall A-C, Poster #3107, from 11 a.m. PST to 2 p.m. PST. I'll be discussing our paper “CoSy: Evaluating Textual Explanations of Neurons.”
December 11, 2024 at 6:43 AM