Lightnews — Scholar-powered news

Harvey Fu

@harveyfu.bsky.social

Does chain-of-thought style reasoning solve this? It helps, without completely solving the problem, and usually requires generating more than 3x more thinking tokens than were in the original document.

[4/n]

June 20, 2025 at 10:06 PM

Harvey Fu

@harveyfu.bsky.social

Why do models struggle at identifying omissions? We find that using placeholders, such as “<missing line>”, to explicitly mark omissions boosts models’ performance by 35.7%. This suggests an inherent weakness with Transformer-style self-attention: models cannot attend to omitted information.

[3/n]

June 20, 2025 at 10:06 PM

Harvey Fu

@harveyfu.bsky.social

AbsenceBench conditions models on two versions of a document, an original and a modified version that deliberately omits certain parts, then asks models to generate what’s being left out.

Although similar to the needle-in-a-haystack (NIAH) task, LLMs perform much worse on AbsenceBench!

[2/n]

June 20, 2025 at 10:05 PM

Harvey Fu

@harveyfu.bsky.social

LLMs excel at finding surprising “needles” in very long documents, but can they detect when information is conspicuously missing?

🫥AbsenceBench🫥 shows that even SoTA LLMs struggle on this task, suggesting that LLMs have trouble perceiving “negative spaces”.
Paper: arxiv.org/abs/2506.11440

🧵[1/n]

June 20, 2025 at 10:03 PM

Harvey Fu

@harveyfu.bsky.social

Does chain-of-thought style reasoning solve this? It helps, without completely solving the problem, and usually requires generating more than 3x more thinking tokens than were in the original document.
[4/n]

June 20, 2025 at 7:59 PM

Harvey Fu

@harveyfu.bsky.social

Why do models struggle at identifying omissions? We find that using placeholders, such as “<missing line>”, to explicitly mark omissions boosts models’ performance by 35.7%. This suggests an inherent weakness with Transformer-style self-attention: models cannot attend to omitted information.
[3/n]

June 20, 2025 at 7:59 PM

Harvey Fu

@harveyfu.bsky.social

AbsenceBench conditions models on two versions of a document, an original and a modified version that deliberately omits certain parts, then asks models to generate what’s being left out.

Although similar to the needle-in-a-haystack (NIAH) task, LLMs perform much worse on AbsenceBench!
[2/n]

June 20, 2025 at 7:58 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news