Lightnews — Scholar-powered news

Patrick Kahardipraja

@pkhdipraja.bsky.social

410 followers 460 following 14 posts

PhD student @ Fraunhofer HHI. Interpretability, incremental NLP, and NLU. https://pkhdipraja.github.io/

Posts Replies Media Videos

Patrick Kahardipraja

@pkhdipraja.bsky.social

We will be presenting the paper at #ACL2025NLP 🇦🇹. Feel free to stop by the poster to say hello!

📅 29/07 (Tue) 10:30-12:00
📍 Hall 4/5

#NLProc #interpretability #XAI #mechinterp #MLSky

July 16, 2025 at 1:26 PM

Patrick Kahardipraja

@pkhdipraja.bsky.social

We supports multiple LLM providers and locally hosted LLMs. For more details, check out our paper! arxiv.org/abs/2502.16994. This project was led by @brunibrun.bsky.social, Aakriti Jain & @golimblevskaia.bsky.social, and helped by Thomas Wiegand, Wojciech Samek, @slapuschkin.bsky.social & me.

FADE: Why Bad Descriptions Happen to Good Features

Recent advances in mechanistic interpretability have highlighted the potential of automating interpretability pipelines in analyzing the latent representations within LLMs. While they may enhance our ...

arxiv.org

July 16, 2025 at 1:26 PM

Patrick Kahardipraja

@pkhdipraja.bsky.social

FADE quantifies the causes of mismatch of feature-to-description alignment and highlights challenges of current methods, such as various failure modes, how SAE features are more difficult to describe compared to MLP, and interpretability of feature descriptions across layers.

July 16, 2025 at 1:26 PM

Patrick Kahardipraja

@pkhdipraja.bsky.social

Thanks for sharing! We are looking into the works you suggested and plan to discuss them in the next revision of this paper :)

May 28, 2025 at 7:28 PM

Patrick Kahardipraja

@pkhdipraja.bsky.social

Many thanks to my amazing co-authors: @reduanachtibat.bsky.social, Thomas Wiegand, Wojciech Samek, @slapuschkin.bsky.social !

#NLProc #interpretability #XAI #mechinterp #MLSky

May 26, 2025 at 4:01 PM

Patrick Kahardipraja

@pkhdipraja.bsky.social

Building on the gained insights, we present a probe to track for knowledge provenance during inference and show where it is localized within the input prompt. Our attempt shows promising results, with >94% ROC AUC and >84% localization accuracy.

4/4

May 26, 2025 at 4:01 PM

Patrick Kahardipraja

@pkhdipraja.bsky.social

We analyze how in-context heads can specialize to understand instructions (task heads) and retrieve relevant information (retrieval heads). Together with parametric heads, we investigate their causal roles by extracting function vectors or modifying their weights.

3/

May 26, 2025 at 4:01 PM

Patrick Kahardipraja

@pkhdipraja.bsky.social

Using interpretability tools, we discover that heads important for RAG can be categorized into two: parametric heads that encode relational knowledge and in-context heads that are responsible for processing information in the prompt.

2/