Lightnews — Scholar-powered news

Chantal

@chantalsh.bsky.social

(3/n) Perhaps more strikingly, unintended syntactic-domain correlations can be exploited to bypass model refusals (e.g., OLMo-2-Instruct 7B here)

October 24, 2025 at 4:23 PM

Chantal

@chantalsh.bsky.social

(2/n) This has important implications for model generalization and safety! We show that this occurs in instruction-tuned models, and propose an evaluation to test for this type of brittleness.

October 24, 2025 at 4:23 PM

Chantal

@chantalsh.bsky.social

(1/n) Models learn to rely on *syntactic templates* (frequent patterns of POS tags) that co-occur with particular domains.

LLMs can inadvertently learn "If I see this syntactic pattern it’s domain X" rather than "If I see this semantic content, do task Y."

October 24, 2025 at 4:23 PM

Chantal

@chantalsh.bsky.social

Syntax that spuriously correlates with safe domains can jailbreak LLMs - e.g. below with GPT4o mini

Our paper (co w/ Vinith Suriyakumar) on syntax-domain spurious correlations will appear at #NeurIPS2025 as a ✨spotlight!

+ @marzyehghassemi.bsky.social, @byron.bsky.social, Levent Sagun

October 24, 2025 at 4:23 PM

Chantal

@chantalsh.bsky.social

(2/7) TL;DR: Measuring the construct of slop is difficult! While somewhat subjective and domain-dependent, it boils down to three key factors: information quality, density, and stylistic choices. We introduce a taxonomy for slop.

September 24, 2025 at 1:21 PM

Chantal

@chantalsh.bsky.social

"AI slop" seems to be everywhere, but what exactly makes text feel like "slop"?

In our new work (w/ @tuhinchakr.bsky.social, Diego Garcia-Olano, @byron.bsky.social ) we provide a systematic attempt at measuring AI "slop" in text!

arxiv.org/abs/2509.19163

🧵 (1/7)

September 24, 2025 at 1:21 PM

Chantal

@chantalsh.bsky.social

(2/7) TL;DR: Measuring the construct of slop is difficult! While somewhat subjective and domain-dependent, it boils down to three key factors: information quality, density, and stylistic choices. We introduce a taxonomy for slop.

September 24, 2025 at 1:18 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news