Lightnews — Scholar-powered news

Yoav Gur Arieh

@yoav.ml

18 followers 50 following 22 posts

Posts Replies Media Videos

Pinned

Yoav Gur Arieh @yoav.ml · Oct 8

🧠 To reason over text and track entities, we find that language models use three types of 'pointers'!

They were thought to rely only on a positional one—but when many entities appear, that system breaks down.

Our new paper shows what these pointers are and how they interact 👇

Yoav Gur Arieh

@yoav.ml

I think I found the latent direction in Gemma (an SAE feature) that represents the pandemic era...

Interpreted by projecting the vector to vocabulary space, yielding a list of tokens associated with it

October 29, 2025 at 9:15 PM

Yoav Gur Arieh

@yoav.ml

Two weeks ago I posted about our recent paper, which shows that to bind entities, LMs use three mechanisms: positional, lexical and reflexive.

We were curious how these mechanisms develop throughout training, so we evaluated their existence across OLMo checkpoints 👇

October 21, 2025 at 7:40 PM

Yoav Gur Arieh

@yoav.ml

October 8, 2025 at 2:56 PM

Yoav Gur Arieh

@yoav.ml

New Paper Alert! Can we precisely erase conceptual knowledge from LLM parameters?
Most methods are shallow, coarse, or overreach, adversely affecting related or general knowledge.

We introduce🪝𝐏𝐈𝐒𝐂𝐄𝐒 — a general framework for Precise In-parameter Concept EraSure. 🧵 1/

May 29, 2025 at 4:22 PM

Reposted by Yoav Gur Arieh

Mor Geva

@megamor2.bsky.social

How can we interpret LLM features at scale? 🤔

Current pipelines use activating inputs, which is costly and ignores how features causally affect model outputs!
We propose efficient output-centric methods that better predict the steering effect of a feature.

New preprint led by @yoav.ml 🧵1/

January 28, 2025 at 7:34 PM

Reposted by Yoav Gur Arieh

Mor Geva

@megamor2.bsky.social

What's in an attention head? 🤯

We present an efficient framework – MAPS – for inferring the functionality of attention heads in LLMs ✨directly from their parameters✨

A new preprint with Amit Elhelo 🧵 (1/10)

December 18, 2024 at 5:55 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news