Chantal
banner
chantalsh.bsky.social
Chantal
@chantalsh.bsky.social
PhD (in progress) @ Northeastern! NLP 🤝 LLMs

she/her
(3/n) Perhaps more strikingly, unintended syntactic-domain correlations can be exploited to bypass model refusals (e.g., OLMo-2-Instruct 7B here)
October 24, 2025 at 4:23 PM
(2/n) This has important implications for model generalization and safety! We show that this occurs in instruction-tuned models, and propose an evaluation to test for this type of brittleness.
October 24, 2025 at 4:23 PM
(1/n) Models learn to rely on *syntactic templates* (frequent patterns of POS tags) that co-occur with particular domains.

LLMs can inadvertently learn "If I see this syntactic pattern it’s domain X" rather than "If I see this semantic content, do task Y."
October 24, 2025 at 4:23 PM
Syntax that spuriously correlates with safe domains can jailbreak LLMs - e.g. below with GPT4o mini

Our paper (co w/ Vinith Suriyakumar) on syntax-domain spurious correlations will appear at #NeurIPS2025 as a ✨spotlight!

+ @marzyehghassemi.bsky.social, @byron.bsky.social, Levent Sagun
October 24, 2025 at 4:23 PM
(2/7) TL;DR: Measuring the construct of slop is difficult! While somewhat subjective and domain-dependent, it boils down to three key factors: information quality, density, and stylistic choices. We introduce a taxonomy for slop.
September 24, 2025 at 1:21 PM
"AI slop" seems to be everywhere, but what exactly makes text feel like "slop"?

In our new work (w/ @tuhinchakr.bsky.social, Diego Garcia-Olano, @byron.bsky.social ) we provide a systematic attempt at measuring AI "slop" in text!

arxiv.org/abs/2509.19163

🧵 (1/7)
September 24, 2025 at 1:21 PM
(2/7) TL;DR: Measuring the construct of slop is difficult! While somewhat subjective and domain-dependent, it boils down to three key factors: information quality, density, and stylistic choices. We introduce a taxonomy for slop.
September 24, 2025 at 1:18 PM