Lightnews — Scholar-powered news

Denis Sutter

@denissutter.bsky.social

9 followers 3 following 10 posts

Msc at @eth interested in ML interpretability

Posts Replies Media Videos

Denis Sutter

@denissutter.bsky.social

7/9 For generality, we present these findings on simpler architectures (MLPs) across multiple random seeds and two additional tasks. This indicates that the issue is not confined to LLMs, but applies more broadly.

July 15, 2025 at 11:21 AM

Denis Sutter

@denissutter.bsky.social

6/9 We further show that small LLMs, which fail at the Indirect Object Identification task, can nevertheless be interpreted as containing such an algorithm.

July 15, 2025 at 11:21 AM

Denis Sutter

@denissutter.bsky.social

5/9 Beyond the theoretical argument, we present a broad set of experiments supporting our claim. Most notably, we show that a randomly initialised LLM can be interpreted as implementing an algorithm for Indirect Object Identification.

July 15, 2025 at 11:21 AM

Denis Sutter

@denissutter.bsky.social

3/9 While we do not critique causal abstraction as a framework, we show that combining it with current insights that modern models store information in a distributed way introduces a fundamental problem.

July 15, 2025 at 11:21 AM

Denis Sutter

@denissutter.bsky.social

1/9 In our new interpretability paper, we analyse causal abstraction—the framework behind Distributed Alignment Search—and show it breaks when we remove linearity constraints on feature representations. We refer to this problem as the Non-Linear Representation Dilemma.

July 15, 2025 at 11:21 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news