Harvey Fu
harveyfu.bsky.social
Harvey Fu
@harveyfu.bsky.social
Does chain-of-thought style reasoning solve this? It helps, without completely solving the problem, and usually requires generating more than 3x more thinking tokens than were in the original document.

[4/n]
June 20, 2025 at 10:06 PM
Why do models struggle at identifying omissions? We find that using placeholders, such as “<missing line>”, to explicitly mark omissions boosts models’ performance by 35.7%. This suggests an inherent weakness with Transformer-style self-attention: models cannot attend to omitted information.

[3/n]
June 20, 2025 at 10:06 PM
AbsenceBench conditions models on two versions of a document, an original and a modified version that deliberately omits certain parts, then asks models to generate what’s being left out.

Although similar to the needle-in-a-haystack (NIAH) task, LLMs perform much worse on AbsenceBench!

[2/n]
June 20, 2025 at 10:05 PM
LLMs excel at finding surprising “needles” in very long documents, but can they detect when information is conspicuously missing?

🫥AbsenceBench🫥 shows that even SoTA LLMs struggle on this task, suggesting that LLMs have trouble perceiving “negative spaces”.
Paper: arxiv.org/abs/2506.11440

🧵[1/n]
June 20, 2025 at 10:03 PM
Does chain-of-thought style reasoning solve this? It helps, without completely solving the problem, and usually requires generating more than 3x more thinking tokens than were in the original document.
[4/n]
June 20, 2025 at 7:59 PM
Why do models struggle at identifying omissions? We find that using placeholders, such as “<missing line>”, to explicitly mark omissions boosts models’ performance by 35.7%. This suggests an inherent weakness with Transformer-style self-attention: models cannot attend to omitted information.
[3/n]
June 20, 2025 at 7:59 PM
AbsenceBench conditions models on two versions of a document, an original and a modified version that deliberately omits certain parts, then asks models to generate what’s being left out.

Although similar to the needle-in-a-haystack (NIAH) task, LLMs perform much worse on AbsenceBench!
[2/n]
June 20, 2025 at 7:58 PM