Lightnews — Scholar-powered news

Sasha Boguraev

@sashaboguraev.bsky.social

Is the RL difficulty just amount of compute/time for rollout difficulty? When I was TAing for an NLP class we hosted leaderboard and eval sets on gradescope (they provided GPUs for us iirc) and required the students to provide us .pt files + code. Unsure if this would work in your use case though…

October 31, 2025 at 2:15 PM

Sasha Boguraev

@sashaboguraev.bsky.social

(Very much inspired by discussions at COLM interplay workshop discussions yesterday)

October 11, 2025 at 6:17 PM

Sasha Boguraev

@sashaboguraev.bsky.social

I guess in the case of the teaching chess to grandmasters the superhuman performance was humanly intelligible (once broken down). On the other hand do we have any idea what move 37 was doing (half rhetorical question and half I’m genuinely curious if there’s been interp work here that’s convincing)?

October 11, 2025 at 6:17 PM

Sasha Boguraev

@sashaboguraev.bsky.social

From an interp perspective I think the question is, will we still be able to find human recognizable features which faithfully describe the model’s activity? Or will it just be completely unrecognizable?

(In reality it’s probably something in between)

October 11, 2025 at 6:17 PM

Sasha Boguraev

@sashaboguraev.bsky.social

No worries! Was just in NYC and figured it worth an ask. Thanks for the pointer.

Separately, would be great to catch up next time I’m around!

September 19, 2025 at 2:34 AM

Sasha Boguraev

@sashaboguraev.bsky.social

Open to non NYU-affiliates?

September 18, 2025 at 1:45 PM

Sasha Boguraev

@sashaboguraev.bsky.social

Wholeheartedly pledging my allegiance to any and all other airlines

August 15, 2025 at 3:32 AM

Sasha Boguraev

@sashaboguraev.bsky.social

But surely there is important novelty in answering both of those questions? Building a novel system/entity and generating a novel proof — inherent to that must be some new ideas by virtue of the questions not being previously answered.

I’m not sure I buy the idea that novelty has to be technical.

July 10, 2025 at 11:20 AM

Sasha Boguraev

@sashaboguraev.bsky.social

We believe this work shows how mechanistic analyses can provide novel insights into syntactic structures — making good on the promise that studying LLMs can help us better understand linguistics by helping us develop linguistically interesting hypotheses!

📄: arxiv.org/abs/2505.16002

Causal Interventions Reveal Shared Structure Across English Filler-Gap Constructions

Large Language Models (LLMs) have emerged as powerful sources of evidence for linguists seeking to develop theories of syntax. In this paper, we argue that causal interpretability methods, applied to ...

arxiv.org

May 27, 2025 at 2:33 PM

Sasha Boguraev

@sashaboguraev.bsky.social

In our last experiment, we probe whether the mechanisms used to process single-clause variants of these constructions generalize to the matrix and embedded clauses of our multi-clause variants. However, we find little evidence of this transfer across our constructions.

May 27, 2025 at 2:33 PM

Sasha Boguraev

@sashaboguraev.bsky.social

This begs the question: what drives constructions to take on these roles? We uncover that a combination of frequency and linguistic similarity is to blame. Namely, less frequent constructions utilize the mechanisms LMs have developed to deal with more frequent, linguistically similar constructions!

May 27, 2025 at 2:33 PM

Sasha Boguraev

@sashaboguraev.bsky.social

We then dive deeper, training interventions on individual constructions and evaluating them across all others, allowing us to build generalization networks. Network analysis reveals clear roles — some constructions act as sources, others as sinks.

May 27, 2025 at 2:33 PM

Sasha Boguraev

@sashaboguraev.bsky.social

We first train interventions on n-1 constructions and test on all, including the held-out one.

Across all positions, we find above-chance transfer of mechanisms with significant positive transfer when the evaluated construction is in the train set, and when the train and eval animacy match.

May 27, 2025 at 2:33 PM

Sasha Boguraev

@sashaboguraev.bsky.social

We use DAS to train interventions, localizing the processing mechanisms specific to given sets of filler-gaps. We then take these interventions, and evaluate them on other filler-gaps. Any observed causal effect duly suggests shared mechanisms across the constructions.

May 27, 2025 at 2:33 PM

Sasha Boguraev

@sashaboguraev.bsky.social

Our investigation focuses on 7 filler–gap constructions: 2 classes of embedded wh-questions, matrix-level wh-questions, restrictive relative clauses, clefts, pseudoclefts, & topicalization. For each construction, we make 4 templates split by animacy of the extraction and number of embedded clauses.

May 27, 2025 at 2:33 PM

Sasha Boguraev

@sashaboguraev.bsky.social

Do you have any thoughts on whether these a) emerged naturally during the RL phase of training (rather than being specifically engineered to encourage more generation or an artifact of some other post-training phase) and if so b) actually represent backtracking in the search?

February 20, 2025 at 11:49 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news