We were curious how these mechanisms develop throughout training, so we evaluated their existence across OLMo checkpoints 👇
We were curious how these mechanisms develop throughout training, so we evaluated their existence across OLMo checkpoints 👇
The first is *lexical*, where the LM retrieves the subject next to "Michael". It does this by copying the lexical contents of "Holly" to "Michael", binding them together. 3/
The first is *lexical*, where the LM retrieves the subject next to "Michael". It does this by copying the lexical contents of "Holly" to "Michael", binding them together. 3/
We show this isn’t sufficient—the positional signal is strong at the edges of context but weak and diffuse in the middle. 2/
We show this isn’t sufficient—the positional signal is strong at the edges of context but weak and diffuse in the middle. 2/
They were thought to rely only on a positional one—but when many entities appear, that system breaks down.
Our new paper shows what these pointers are and how they interact 👇
They were thought to rely only on a positional one—but when many entities appear, that system breaks down.
Our new paper shows what these pointers are and how they interact 👇
Joint work with Clara Suslik, Yihuai Hong, and @fbarez.bsky.social, advised by @megamor2.bsky.social
🔗 Paper: arxiv.org/abs/2505.22586
🔗 Code: github.com/yoavgur/PISCES
Joint work with Clara Suslik, Yihuai Hong, and @fbarez.bsky.social, advised by @megamor2.bsky.social
🔗 Paper: arxiv.org/abs/2505.22586
🔗 Code: github.com/yoavgur/PISCES