Sasha Boguraev
@sashaboguraev.bsky.social
Compling PhD student @UT_Linguistics | prev. CS, Math, Comp. Cognitive Sci @cornell
Wholeheartedly pledging my allegiance to any and all other airlines
August 15, 2025 at 3:32 AM
Wholeheartedly pledging my allegiance to any and all other airlines
In our last experiment, we probe whether the mechanisms used to process single-clause variants of these constructions generalize to the matrix and embedded clauses of our multi-clause variants. However, we find little evidence of this transfer across our constructions.
May 27, 2025 at 2:33 PM
In our last experiment, we probe whether the mechanisms used to process single-clause variants of these constructions generalize to the matrix and embedded clauses of our multi-clause variants. However, we find little evidence of this transfer across our constructions.
This begs the question: what drives constructions to take on these roles? We uncover that a combination of frequency and linguistic similarity is to blame. Namely, less frequent constructions utilize the mechanisms LMs have developed to deal with more frequent, linguistically similar constructions!
May 27, 2025 at 2:33 PM
This begs the question: what drives constructions to take on these roles? We uncover that a combination of frequency and linguistic similarity is to blame. Namely, less frequent constructions utilize the mechanisms LMs have developed to deal with more frequent, linguistically similar constructions!
We then dive deeper, training interventions on individual constructions and evaluating them across all others, allowing us to build generalization networks. Network analysis reveals clear roles — some constructions act as sources, others as sinks.
May 27, 2025 at 2:33 PM
We then dive deeper, training interventions on individual constructions and evaluating them across all others, allowing us to build generalization networks. Network analysis reveals clear roles — some constructions act as sources, others as sinks.
We first train interventions on n-1 constructions and test on all, including the held-out one.
Across all positions, we find above-chance transfer of mechanisms with significant positive transfer when the evaluated construction is in the train set, and when the train and eval animacy match.
Across all positions, we find above-chance transfer of mechanisms with significant positive transfer when the evaluated construction is in the train set, and when the train and eval animacy match.
May 27, 2025 at 2:33 PM
We first train interventions on n-1 constructions and test on all, including the held-out one.
Across all positions, we find above-chance transfer of mechanisms with significant positive transfer when the evaluated construction is in the train set, and when the train and eval animacy match.
Across all positions, we find above-chance transfer of mechanisms with significant positive transfer when the evaluated construction is in the train set, and when the train and eval animacy match.
We use DAS to train interventions, localizing the processing mechanisms specific to given sets of filler-gaps. We then take these interventions, and evaluate them on other filler-gaps. Any observed causal effect duly suggests shared mechanisms across the constructions.
May 27, 2025 at 2:33 PM
We use DAS to train interventions, localizing the processing mechanisms specific to given sets of filler-gaps. We then take these interventions, and evaluate them on other filler-gaps. Any observed causal effect duly suggests shared mechanisms across the constructions.
Our investigation focuses on 7 filler–gap constructions: 2 classes of embedded wh-questions, matrix-level wh-questions, restrictive relative clauses, clefts, pseudoclefts, & topicalization. For each construction, we make 4 templates split by animacy of the extraction and number of embedded clauses.
May 27, 2025 at 2:33 PM
Our investigation focuses on 7 filler–gap constructions: 2 classes of embedded wh-questions, matrix-level wh-questions, restrictive relative clauses, clefts, pseudoclefts, & topicalization. For each construction, we make 4 templates split by animacy of the extraction and number of embedded clauses.
A key hypothesis in the history of linguistics is that different constructions share underlying structure. We take advantage of recent advances in mechanistic interpretability to test this hypothesis in Language Models.
New work with @kmahowald.bsky.social and @cgpotts.bsky.social!
🧵👇!
New work with @kmahowald.bsky.social and @cgpotts.bsky.social!
🧵👇!
May 27, 2025 at 2:33 PM
A key hypothesis in the history of linguistics is that different constructions share underlying structure. We take advantage of recent advances in mechanistic interpretability to test this hypothesis in Language Models.
New work with @kmahowald.bsky.social and @cgpotts.bsky.social!
🧵👇!
New work with @kmahowald.bsky.social and @cgpotts.bsky.social!
🧵👇!
Notoriously finicky BC weather celebrating the last day of #NeurIPS2024 with a rainbow across the harbor
December 15, 2024 at 9:27 PM
Notoriously finicky BC weather celebrating the last day of #NeurIPS2024 with a rainbow across the harbor