Lightnews — Scholar-powered news

Patrick Kahardipraja

@pkhdipraja.bsky.social

410 followers 460 following 14 posts

PhD student @ Fraunhofer HHI. Interpretability, incremental NLP, and NLU. https://pkhdipraja.github.io/

Posts Replies Media Videos

Patrick Kahardipraja

@pkhdipraja.bsky.social

Autointerp provides us descriptions of LLMs features, but how it is evaluated varies from one setting to another. We propose FADE, a framework that enables standardized, automatic evaluation of alignment between features and autointerp descriptions across various metrics.

July 16, 2025 at 1:26 PM

Patrick Kahardipraja

@pkhdipraja.bsky.social

Building on the gained insights, we present a probe to track for knowledge provenance during inference and show where it is localized within the input prompt. Our attempt shows promising results, with >94% ROC AUC and >84% localization accuracy.

4/4

May 26, 2025 at 4:01 PM

Patrick Kahardipraja

@pkhdipraja.bsky.social

Using interpretability tools, we discover that heads important for RAG can be categorized into two: parametric heads that encode relational knowledge and in-context heads that are responsible for processing information in the prompt.

2/

May 26, 2025 at 4:01 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news