Thomas Fel
@thomasfel.bsky.social
Explainability, Computer Vision, Neuro-AI.🪴 Kempner Fellow @Harvard.
Prev. PhD @Brown, @Google, @GoPro. Crêpe lover.
📍 Boston | 🔗 thomasfel.me
Prev. PhD @Brown, @Google, @GoPro. Crêpe lover.
📍 Boston | 🔗 thomasfel.me
Thx a lot Naomi ! 🙌🥹
October 16, 2025 at 9:50 PM
Thx a lot Naomi ! 🙌🥹
That concludes this two-part descent into the Rabbit Hull.
Huge thanks to all collaborators who made this work possible — and especially to @binxuwang.bsky.social , with whom this project was built, experiment after experiment.
🎮 kempnerinstitute.github.io/dinovision/
📄 arxiv.org/pdf/2510.08638
Huge thanks to all collaborators who made this work possible — and especially to @binxuwang.bsky.social , with whom this project was built, experiment after experiment.
🎮 kempnerinstitute.github.io/dinovision/
📄 arxiv.org/pdf/2510.08638
October 15, 2025 at 5:17 PM
That concludes this two-part descent into the Rabbit Hull.
Huge thanks to all collaborators who made this work possible — and especially to @binxuwang.bsky.social , with whom this project was built, experiment after experiment.
🎮 kempnerinstitute.github.io/dinovision/
📄 arxiv.org/pdf/2510.08638
Huge thanks to all collaborators who made this work possible — and especially to @binxuwang.bsky.social , with whom this project was built, experiment after experiment.
🎮 kempnerinstitute.github.io/dinovision/
📄 arxiv.org/pdf/2510.08638
If this holds, three implications:
(i) Concepts = points (or regions), not directions
(ii) Probing is bounded: toward archetypes, not vectors
(iii) Can't recover generating hulls from sum: we should look deeper than just a single-layer activations to recover the true latents
(i) Concepts = points (or regions), not directions
(ii) Probing is bounded: toward archetypes, not vectors
(iii) Can't recover generating hulls from sum: we should look deeper than just a single-layer activations to recover the true latents
October 15, 2025 at 5:17 PM
If this holds, three implications:
(i) Concepts = points (or regions), not directions
(ii) Probing is bounded: toward archetypes, not vectors
(iii) Can't recover generating hulls from sum: we should look deeper than just a single-layer activations to recover the true latents
(i) Concepts = points (or regions), not directions
(ii) Probing is bounded: toward archetypes, not vectors
(iii) Can't recover generating hulls from sum: we should look deeper than just a single-layer activations to recover the true latents
Synthesizing these observations, we propose a refined view, motivated by Gärdenfors' theory and attention geometry.
Activations = multiple convex hulls simultaneously: a rabbit among animals, brown among colors, fluffy among textures.
The Minkowski Representation Hypothesis.
Activations = multiple convex hulls simultaneously: a rabbit among animals, brown among colors, fluffy among textures.
The Minkowski Representation Hypothesis.
October 15, 2025 at 5:17 PM
Synthesizing these observations, we propose a refined view, motivated by Gärdenfors' theory and attention geometry.
Activations = multiple convex hulls simultaneously: a rabbit among animals, brown among colors, fluffy among textures.
The Minkowski Representation Hypothesis.
Activations = multiple convex hulls simultaneously: a rabbit among animals, brown among colors, fluffy among textures.
The Minkowski Representation Hypothesis.
Taken together, the signs of partial density, local connectedness, and coherent dictionary atoms indicate that DINO’s representations are organized beyond linear sparsity alone.
October 15, 2025 at 5:17 PM
Taken together, the signs of partial density, local connectedness, and coherent dictionary atoms indicate that DINO’s representations are organized beyond linear sparsity alone.
Can position explain this ?
We found that pos. information collapses: from high-rank to a near 2-dim sheet. Early layers encode precise location; later ones retain abstract axes.
This compression frees dimensions for features, and *position doesn't explain PCA map smoothness*
We found that pos. information collapses: from high-rank to a near 2-dim sheet. Early layers encode precise location; later ones retain abstract axes.
This compression frees dimensions for features, and *position doesn't explain PCA map smoothness*
October 15, 2025 at 5:17 PM
Can position explain this ?
We found that pos. information collapses: from high-rank to a near 2-dim sheet. Early layers encode precise location; later ones retain abstract axes.
This compression frees dimensions for features, and *position doesn't explain PCA map smoothness*
We found that pos. information collapses: from high-rank to a near 2-dim sheet. Early layers encode precise location; later ones retain abstract axes.
This compression frees dimensions for features, and *position doesn't explain PCA map smoothness*
Patch embeddings form smooth, connected surfaces tracing objects and boundaries.
This may suggests interpolative geometry: tokens as mixtures between landmarks, shaped by clustering and spreading forces in the training objectives.
This may suggests interpolative geometry: tokens as mixtures between landmarks, shaped by clustering and spreading forces in the training objectives.
October 15, 2025 at 5:17 PM
Patch embeddings form smooth, connected surfaces tracing objects and boundaries.
This may suggests interpolative geometry: tokens as mixtures between landmarks, shaped by clustering and spreading forces in the training objectives.
This may suggests interpolative geometry: tokens as mixtures between landmarks, shaped by clustering and spreading forces in the training objectives.
We found antipodal feature pairs (dᵢ ≈ − dⱼ): vertical vs horizontal lines, white vs black shirts, left vs right…
Also, co-activation statistics only moderately shape geometry: concepts that fire together aren't necessarily nearby—nor orthogonal when they don't.
Also, co-activation statistics only moderately shape geometry: concepts that fire together aren't necessarily nearby—nor orthogonal when they don't.
October 15, 2025 at 5:17 PM
We found antipodal feature pairs (dᵢ ≈ − dⱼ): vertical vs horizontal lines, white vs black shirts, left vs right…
Also, co-activation statistics only moderately shape geometry: concepts that fire together aren't necessarily nearby—nor orthogonal when they don't.
Also, co-activation statistics only moderately shape geometry: concepts that fire together aren't necessarily nearby—nor orthogonal when they don't.
Under the Linear Rep. Hypothesis, we'd expect Dictionary to be quasi-orthogonality.
Instead, training drives atoms from near-Grassmannian initialization to higher coherence.
Several concepts fire almost always the embedding is partly dense (!), contradicting pure sparse coding.
Instead, training drives atoms from near-Grassmannian initialization to higher coherence.
Several concepts fire almost always the embedding is partly dense (!), contradicting pure sparse coding.
October 15, 2025 at 5:17 PM
Under the Linear Rep. Hypothesis, we'd expect Dictionary to be quasi-orthogonality.
Instead, training drives atoms from near-Grassmannian initialization to higher coherence.
Several concepts fire almost always the embedding is partly dense (!), contradicting pure sparse coding.
Instead, training drives atoms from near-Grassmannian initialization to higher coherence.
Several concepts fire almost always the embedding is partly dense (!), contradicting pure sparse coding.
Huge thanks to all collaborators who made this work possible, and especially to @binxuwang.bsky.social. This work grew from a year of collaboration!
Tomorrow, Part II: geometry of concepts and Minkowski Representation Hypothesis.
🕹️ kempnerinstitute.github.io/dinovision
📄 arxiv.org/pdf/2510.08638
Tomorrow, Part II: geometry of concepts and Minkowski Representation Hypothesis.
🕹️ kempnerinstitute.github.io/dinovision
📄 arxiv.org/pdf/2510.08638
October 14, 2025 at 9:00 PM
Huge thanks to all collaborators who made this work possible, and especially to @binxuwang.bsky.social. This work grew from a year of collaboration!
Tomorrow, Part II: geometry of concepts and Minkowski Representation Hypothesis.
🕹️ kempnerinstitute.github.io/dinovision
📄 arxiv.org/pdf/2510.08638
Tomorrow, Part II: geometry of concepts and Minkowski Representation Hypothesis.
🕹️ kempnerinstitute.github.io/dinovision
📄 arxiv.org/pdf/2510.08638
Curious tokens, the registers.
DINO seems to use them to encode global invariants: we find concepts (directions) that fire exclusively (!) on registers.
Example of such concepts include motion blur detector and style (game screenshots, drawings, paintings, warped images...)
DINO seems to use them to encode global invariants: we find concepts (directions) that fire exclusively (!) on registers.
Example of such concepts include motion blur detector and style (game screenshots, drawings, paintings, warped images...)
October 14, 2025 at 9:00 PM
Curious tokens, the registers.
DINO seems to use them to encode global invariants: we find concepts (directions) that fire exclusively (!) on registers.
Example of such concepts include motion blur detector and style (game screenshots, drawings, paintings, warped images...)
DINO seems to use them to encode global invariants: we find concepts (directions) that fire exclusively (!) on registers.
Example of such concepts include motion blur detector and style (game screenshots, drawings, paintings, warped images...)
Now for depth estimation. How does DINO know depth?
It turns out it has discovered several human-like monocular depth cues: texture gradients resembling blurring or bokeh, shadow detectors, and projective cues.
Most units mix cues, but a few remain remarkably pure.
It turns out it has discovered several human-like monocular depth cues: texture gradients resembling blurring or bokeh, shadow detectors, and projective cues.
Most units mix cues, but a few remain remarkably pure.
October 14, 2025 at 9:00 PM
Now for depth estimation. How does DINO know depth?
It turns out it has discovered several human-like monocular depth cues: texture gradients resembling blurring or bokeh, shadow detectors, and projective cues.
Most units mix cues, but a few remain remarkably pure.
It turns out it has discovered several human-like monocular depth cues: texture gradients resembling blurring or bokeh, shadow detectors, and projective cues.
Most units mix cues, but a few remain remarkably pure.
Another surprise here: the most important concepts are not object-centric at all, but boundary detectors. Remarkably, these concepts coalesce into a low-dimensional subspace within (see paper).
October 14, 2025 at 9:00 PM
Another surprise here: the most important concepts are not object-centric at all, but boundary detectors. Remarkably, these concepts coalesce into a low-dimensional subspace within (see paper).
This kind of concept breaks a key assumption in interpretability: that a concept is about the tokens where it fires. Here it is the opposite—the concept is defined by where it does not fire. An open question is how models form such concepts.
October 14, 2025 at 9:00 PM
This kind of concept breaks a key assumption in interpretability: that a concept is about the tokens where it fires. Here it is the opposite—the concept is defined by where it does not fire. An open question is how models form such concepts.
Let's zoom in on classification.
For every class, we find two concepts: one fires on the object (e.g., "rabbit"), and another fires everywhere *except* the object -- but only when it's present!
We call them Elsewhere Concepts (credit: @davidbau.bsky.social).
For every class, we find two concepts: one fires on the object (e.g., "rabbit"), and another fires everywhere *except* the object -- but only when it's present!
We call them Elsewhere Concepts (credit: @davidbau.bsky.social).
October 14, 2025 at 9:00 PM
Let's zoom in on classification.
For every class, we find two concepts: one fires on the object (e.g., "rabbit"), and another fires everywhere *except* the object -- but only when it's present!
We call them Elsewhere Concepts (credit: @davidbau.bsky.social).
For every class, we find two concepts: one fires on the object (e.g., "rabbit"), and another fires everywhere *except* the object -- but only when it's present!
We call them Elsewhere Concepts (credit: @davidbau.bsky.social).
Assuming the Linear Rep. Hypothesis, SAEs arise naturally as instruments for concept extraction, they will be our companions in this descent.
Archetypal SAE uncovered 32k concepts.
Our first observation: different tasks recruit distinct regions of this conceptual space.
Archetypal SAE uncovered 32k concepts.
Our first observation: different tasks recruit distinct regions of this conceptual space.
October 14, 2025 at 9:00 PM
Assuming the Linear Rep. Hypothesis, SAEs arise naturally as instruments for concept extraction, they will be our companions in this descent.
Archetypal SAE uncovered 32k concepts.
Our first observation: different tasks recruit distinct regions of this conceptual space.
Archetypal SAE uncovered 32k concepts.
Our first observation: different tasks recruit distinct regions of this conceptual space.
Really neat, congrats !
October 12, 2025 at 12:59 AM
Really neat, congrats !