Yotam Erel
banner
yotamerel.bsky.social
Yotam Erel
@yotamerel.bsky.social
CS PhD candidate @ Tel Aviv University
https://yoterel.github.io
{8/8}
This framework offers a new way to probe and reason about attention !

📄 Paper: “Attention (as Discrete-Time Markov) Chains”
🔗 yoterel.github.io/attention_ch...
👥 Yotam Erel*, @oduenkel.bsky.social*, Rishabh Dabral, Vlad Golyanik, Christian Theobalt, Amit Bermano
*denotes equal contribution
Attention (as Discrete-Time Markov) Chains
Attention (as Discrete-Time Markov) Chains
yoterel.github.io
July 24, 2025 at 12:50 PM
{7/8}
This reinterpretation yields results:
✅ State-of-the-art zero-shot segmentation
✅ Cleaner, sharper attention visualizations
✅ Better unconditional image generation

All without extra training—just a different perspective.
July 24, 2025 at 12:50 PM
{6/8}
We define TokenRank:
The steady-state distribution of the attention Markov chain.
Like PageRank—but for tokens.

It measures global token importance—not just who’s attended, but who gets attended through others.
July 24, 2025 at 12:50 PM
{5/8}
💡 Here’s the golden insight:
In practice, attention tends to linger among semantically similar tokens.

These are metastable states—regions where attention circulates before escaping.
Modeling this lets us filter noise and highlight meaningful structure.
July 24, 2025 at 12:50 PM
{4/8}
But wait 🚨! the transformer was never trained to account for indirect attention, it just applies the attention map once every pass, what gives?
July 24, 2025 at 12:50 PM
{3/8}
We interpret each attention matrix as a discrete-time Markov chain,
where:

Tokens = states

Attention weights = transition probabilities

This reframes attention as a dynamic process, not just a static lookup.
July 24, 2025 at 12:50 PM
{2/8}
Most attention maps analysis is local: we reduce dimensions for visualizations using row- or column-selects, column sums, head averages, etc.
These only capture direct token-to-token interactions.

But what if we also considered indirect effects?
July 24, 2025 at 12:50 PM