Denis Sutter
denissutter.bsky.social
Denis Sutter
@denissutter.bsky.social
Msc at @eth interested in ML interpretability
1/9 In our new interpretability paper, we analyse causal abstraction—the framework behind Distributed Alignment Search—and show it breaks when we remove linearity constraints on feature representations. We refer to this problem as the Non-Linear Representation Dilemma.
July 15, 2025 at 11:21 AM