Lightnews — Scholar-powered news

Reposted by Explainable AI

@mdhk.net

Had such a great time presenting our tutorial on Interpretability Techniques for Speech Models at #Interspeech2025! 🔍

For anyone looking for an introduction to the topic, we've now uploaded all materials to the website: interpretingdl.github.io/speech-inter...

August 19, 2025 at 9:23 PM

Reposted by Explainable AI

Antonin Poché @ ACL

@antoninpoche.bsky.social

🔥 I am super excited to be presenting a poster at #ACL2025 in Vienna next week! 🌏

This is my first big conference!

📅 Tuesday morning, 10:30–12:00, during Poster Session 2.

💬 If you're around, feel free to message me. I would be happy to connect, chat, or have a drink!

July 25, 2025 at 3:37 PM

Reposted by Explainable AI

Naomi Saphra

@nsaphra.bsky.social

ACL paper alert! What structure is lost when using linearizing interp methods like Shapley? We show the nonlinear interactions between features reflect structures described by the sciences of syntax, semantics, and phonology.

June 12, 2025 at 6:56 PM

Reposted by Explainable AI

Sweta Mahajan

@swetamahajan.bsky.social

🚨Deadline Extension Alert!

Our Non-proceedings track is open till August 15th for the eXCV workshop at ICCV.

Our nectar track accepts published papers, as is.

More info at: excv-workshop.github.io

@iccv.bsky.social #ICCV2025

July 18, 2025 at 9:31 AM

Reposted by Explainable AI

Dana Arad

@danaarad.bsky.social

10 days to go! Still time to run your method and submit!

BlackboxNLP @blackboxnlp.bsky.social · Jul 23

Just 10 days to go until the results submission deadline for the MIB Shared Task at #BlackboxNLP!

If you're working on:
🧠 Circuit discovery
🔍 Feature attribution
🧪 Causal variable localization
now’s the time to polish and submit!

Join us on Discord: discord.gg/n5uwjQcxPR

July 23, 2025 at 8:21 AM

Reposted by Explainable AI

Jennifer Hu

@jennhu.bsky.social

Excited to announce the first workshop on CogInterp: Interpreting Cognition in Deep Learning Models @ NeurIPS 2025! 📣

How can we interpret the algorithms and representations underlying complex behavior in deep learning models?

🌐 coginterp.github.io/neurips2025/

1/4

Home

First Workshop on Interpreting Cognition in Deep Learning Models (NeurIPS 2025)

coginterp.github.io

July 16, 2025 at 1:08 PM

Reposted by Explainable AI

Aaron Mueller

@amuuueller.bsky.social

If you're at #ICML2025, chat with me, @sarah-nlp.bsky.social, Atticus, and others at our poster 11am - 1:30pm at East #1205! We're establishing a 𝗠echanistic 𝗜nterpretability 𝗕enchmark.

We're planning to keep this a living benchmark; come by and share your ideas/hot takes!

July 17, 2025 at 5:45 PM

Reposted by Explainable AI

Katharina Prasse

@katharinaprasse.bsky.social

Poster is up and we are looking forward to the #ICML2025 poster session. Come join @patrickknab.bsky.social and me at Poster #W-214 presenting our work with @smarton.bsky.social, Christian Bartelt, and @margretkeuper.bsky.social @margretkeuper.bsky.social #UniMa

July 17, 2025 at 6:00 PM

Reposted by Explainable AI

Sarah Wiegreffe

@sarah-nlp.bsky.social

I am at #ICML2025! 🇨🇦🏞️
Catch me:

1️⃣ Presenting this paper👇 tomorrow 11am-1:30pm at East #1205

2️⃣ At the Actionable Interpretability @actinterp.bsky.social workshop on Saturday in East Ballroom A (I’m an organizer!)

Aaron Mueller @amuuueller.bsky.social · Apr 23

Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work?

We propose 😎 𝗠𝗜𝗕: a 𝗠echanistic 𝗜nterpretability 𝗕enchmark!

July 16, 2025 at 11:09 PM

Reposted by Explainable AI

Julian Minder

@jkminder.bsky.social

Causal Abstraction, the theory behind DAS, tests if a network realizes a given algorithm. We show (w/ @denissutter.bsky.social, T. Hofmann, @tpimentel.bsky.social ) that the theory collapses without the linear representation hypothesis—a problem we call the non-linear representation dilemma.

July 17, 2025 at 10:57 AM

Reposted by Explainable AI

Katharina Prasse

@katharinaprasse.bsky.social

Join us on Thursday 11-13 in poster hall West #214 to discuss image segments as concepts. #ICML2025 @patrickknab.bsky.social @smarton.bsky.social Christian bartelt @margretkeuper.bsky.social @keuper-labs.bsky.social

Keuper Labs @keuper-labs.bsky.social · Jul 15

Attending #ICML2025? Watch out for our paper "DCBM: Data-Efficient Visual Concept Bottleneck Models" presented by @katharinaprasse.bsky.social -> github.com/KathPra/DCBM

July 15, 2025 at 10:01 PM

Reposted by Explainable AI

Harry Thasarathan

@hthasarathan.bsky.social

🌌🛰️🔭Want to explore universal visual features? Check out our interactive demo of concepts learned from our #ICML2025 paper "Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment".

Come see our poster at 4pm on Tuesday in East Exhibition hall A-B, E-1208!

July 15, 2025 at 2:36 AM

Reposted by Explainable AI

Computer Vision and Machine Learning at MPI Informatics

@cvml.mpi-inf.mpg.de

Papers being presented from our group at #ICML2025!

Congratulations to all the authors! To know more, visit us in the poster sessions!

A 🧵with more details:

@icmlconf.bsky.social @mpi-inf.mpg.de

Papers accepted at ICML 2025 from the Computer Vision and Machine Learning Department at the Max Planck Institute for Informatics.

July 13, 2025 at 8:00 AM

Reposted by Explainable AI

Ulrike Luxburg

@ulrikeluxburg.bsky.social

Our #ICML position paper: #XAI is similar to applied statistics: it uses summary statistics in an attempt to answer real world questions. But authors need to state how concretely (!) their XAI statistics contributes to answer which concrete (!) question!
arxiv.org/abs/2402.02870

Sebastian Bordt @sbordt.bsky.social · Jul 10

During the last couple of years, we have read a lot of papers on explainability and often felt that something was fundamentally missing🤔

This led us to write a position paper (accepted at #ICML2025) that attempts to identify the problem and to propose a solution.

arxiv.org/abs/2402.02870
👇🧵

July 11, 2025 at 7:35 AM

Reposted by Explainable AI

Naomi Saphra

@nsaphra.bsky.social

🚨 New preprint! 🚨

Everyone loves causal interp. It’s coherently defined! It makes testable predictions about mechanistic interventions! But what if we had a different objective: predicting model behavior not under mechanistic interventions, but on unseen input data?

July 10, 2025 at 2:31 PM

Reposted by Explainable AI

Sweta Mahajan

@swetamahajan.bsky.social

Introducing the speakers for the eXCV workshop at ICCV, Hawaii. Get ready for many stimulating and insightful talks and discussions.

Our Non-proceedings track is still open!

Paper submission deadline: July 18, 2025

More info at: excv-workshop.github.io

@iccv.bsky.social #ICCV2025

July 10, 2025 at 12:49 PM

Reposted by Explainable AI

Gunnar König

@gunnark.bsky.social

In many XAI applications, it is crucial to determine whether features contribute individually or only when combined. However, existing methods fail to reveal cooperations since they entangle individual contributions with those made via interactions and dependencies. We show how to disentangle them!

July 7, 2025 at 3:37 PM

Reposted by Explainable AI

Dana Arad

@danaarad.bsky.social

Tried steering with SAEs and found that not all features behave as expected?

Check out our new preprint - "SAEs Are Good for Steering - If You Select the Right Features" 🧵

May 27, 2025 at 4:06 PM

Reposted by Explainable AI

nikhil07prakash.bsky.social

@nikhil07prakash.bsky.social

How do language models track mental states of each character in a story, often referred to as Theory of Mind?

We reverse-engineered how LLaMA-3-70B-Instruct handles a belief-tracking task and found something surprising: it uses mechanisms strikingly similar to pointer variables in C programming!

June 24, 2025 at 5:13 PM

Reposted by Explainable AI

Dana Arad

@danaarad.bsky.social

VLMs perform better on questions about text than when answering the same questions about images - but why? and how can we fix it?

In a new project led by Yaniv (@YNikankin on the other app), we investigate this gap from an mechanistic perspective, and use our findings to close a third of it! 🧵

June 26, 2025 at 10:41 AM

Reposted by Explainable AI

BlackboxNLP

@blackboxnlp.bsky.social

Have you heard about this year's shared task? 📢

Mechanistic Interpretability (MI) is quickly advancing, but comparing methods remains a challenge. This year at #BlackboxNLP, we're introducing a shared task to rigorously evaluate MI methods in language models 🧵

June 23, 2025 at 2:46 PM

Reposted by Explainable AI

Oliver Eberle

@eberleoliver.bsky.social

Our position paper on algorithmic explanations is out—excited to share it! 🙌

Proud of this collaborative effort toward a scientifically grounded understanding of generative AI.

@tuberlin.bsky.social @bifold.berlin @msftresearch.bsky.social @UCSD & @UCLA

June 20, 2025 at 5:12 PM

Reposted by Explainable AI

Laura Kopf

@lkopf.bsky.social

🔍 When do neurons encode multiple concepts?

We introduce PRISM, a framework for extracting multi-concept feature descriptions to better understand polysemanticity.

📄 Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework
arxiv.org/abs/2506.15538

🧵 (1/7)

June 19, 2025 at 3:18 PM

Reposted by Explainable AI

Oliver Eberle

@eberleoliver.bsky.social

🚨 New preprint! Excited to share our work on extracting and evaluating the potentially many feature descriptions of language models

👉 arxiv.org/abs/2506.15538

Laura Kopf @lkopf.bsky.social · Jun 19

🔍 When do neurons encode multiple concepts?

We introduce PRISM, a framework for extracting multi-concept feature descriptions to better understand polysemanticity.

📄 Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework
arxiv.org/abs/2506.15538

🧵 (1/7)

June 19, 2025 at 4:44 PM

Reposted by Explainable AI

Visual Inference Lab

@visinf.bsky.social

Want to learn about how model design choices affect the attribution quality of vision models? Visit our #NeurIPS2024 poster on Friday afternoon (East Exhibition Hall A-C #2910)!

Paper: arxiv.org/abs/2407.11910
Code: github.com/visinf/idsds

December 13, 2024 at 10:10 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news