Yasaman Bahri
banner
yasamanbb.bsky.social
Yasaman Bahri
@yasamanbb.bsky.social
Research Scientist @ Google DeepMind. AI + physics. Prev Ph.D. @ UC Berkeley.

https://sites.google.com/view/yasamanbahri/home/
Figure showing translation symmetry in co-occurrence statistics & PCA of model representations match across theory, word2vec, and LLMs:
February 19, 2026 at 4:20 AM
and thereby mediate correlations and constrain the geometry of representations. The robustness of this representational geometry should therefore be understood as a collective effect (!).
February 19, 2026 at 4:20 AM
We had observed a similar robustness in our earlier work (arxiv.org/abs/2505.18651). In our new paper, this geometric recovery can be explained by extending our prior theory to one with a continuous latent variable. That is, many words in a vocabulary have a notion of e.g. 'time' or 'space' ...
February 19, 2026 at 4:20 AM
This means important geometric information is hidden in co-occurrence between these words and other words in the vocabulary - for example, words with a notion of seasonality - that have some semantic overlap.
February 19, 2026 at 4:20 AM
Surprisingly, the geometric information for a collection of words - for example, the 12 calendar months of the year - does not arise solely from co-occurrences within that group. One can ablate their contribution entirely and find that representations of the 12 months can still be recovered.
February 19, 2026 at 4:20 AM
Neural representations can be used for decoding via linear probes (such as predicting spatial or temporal coordinates), and our theory, based on constraints from symmetry, predicts the efficiency of this decoding process, matching empirics.
February 19, 2026 at 4:20 AM
That our theory carries over to LLM observations (despite lacking a theoretical handle here) demonstrates how symmetry in simple low-order statistics can have robust effects on representations.
February 19, 2026 at 4:20 AM
Word embeddings there have Fourier PCA modes, and the geometry we obtain here is predictive of that found in LLM hidden layers, explaining & unifying prior observations with a single idea.
February 19, 2026 at 4:20 AM
Translation symmetry in co-occurrence statistics & PCA of model representations match across theory, word2vec, and LLMs:
February 19, 2026 at 4:20 AM
to a translation symmetry that can be seen empirically in the co-occurrence statistics of natural language (!). That is, the co-occurrence of words in such a collection (which semantically correspond to a collection of points on a lattice) depends only on the distance between them.
February 19, 2026 at 4:20 AM
Prior work has found that LLM representations of certain collections of words (such as words corresponding to space, time, and color - among others) exhibit simple, regular structure in their PCA components. We show this arises in simple word embedding models (word2vec) as well, and trace it back...
February 19, 2026 at 4:20 AM
Congratulations!
January 9, 2026 at 8:10 AM
...this work on Fri 12/5.
December 4, 2025 at 7:01 PM
Surprisingly, there is great agreement with real language data (you can even see the Kronecker product structure in Wikipedia text!). As we found later, our theoretical model makes concrete some ideas put forth by the cognitive psychologist David Rumelhart. Daniel (lead author) will be presenting...
December 4, 2025 at 7:01 PM
We propose a latent variable model that prescribes a particular (Kronecker product) structure for the co-occurrence probabilities of words. The eigendecomposition is analytically solvable and gives testable predictions for when, how, and why the ability to solve linear analogies emerges.
December 4, 2025 at 7:01 PM
can complete analogies, we felt they did not satisfyingly address some stringent empirical tests.

In arxiv.org/abs/2505.18651, with Daniel Korchinski, Dhruva, and Matthieu Wyart, we propose a new theory.
On the Emergence of Linear Analogies in Word Embeddings
Models such as Word2Vec and GloVe construct word embeddings based on the co-occurrence probability $P(i,j)$ of words $i$ and $j$ in text corpora. The resulting vectors $W_i$ not only group semanticall...
arxiv.org
December 4, 2025 at 7:01 PM
The ability to do analogical reasoning with word vectors is perhaps the simplest example of an "emergent" ability, in the sense that nontrivial computational properties arise despite the loss not having been explicitly optimized for this task. While many works have tried to explain why word vectors
December 4, 2025 at 7:01 PM
(with famous examples like "king is to queen as man is to woman"). Dhruva Karkada (lead author) will be presenting this work at NeurIPS on Thu 12/4.
December 4, 2025 at 7:01 PM
of the co-occurrence statistics of words (a measure of two-point correlations).

Among other things, this means that the *complete eigendecomposition* (mode by mode) of co-occurrence probabilities of words is important for understanding why word vectors are able to complete simple analogies
December 4, 2025 at 7:01 PM
In arxiv.org/abs/2502.09863, we show that a family of supervised loss functions, quartic in the learnable weights, capture the learning dynamics and semantic structure of word embedding models such as word2vec. This allows closed-form expressions for the full trajectory of learning in terms
Closed-Form Training Dynamics Reveal Learned Features and Linear Structure in Word2Vec-like Models
Self-supervised word embedding algorithms such as word2vec provide a minimal setting for studying representation learning in language modeling. We examine the quartic Taylor approximation of the word2...
arxiv.org
December 4, 2025 at 7:01 PM