Lightnews — Scholar-powered news

Anmol Kabra

@anmolkabra.com

Presenting PhantomWiki with @albertgong.bsky.social and Johann at @icmlconf.bsky.social on Tuesday 11am + an oral talk at Long Context Workshop on Saturday! Come say hi/chat about LLM reasoning and retrieval evaluation!

July 14, 2025 at 9:33 PM

Anmol Kabra

@anmolkabra.com

🚨 Our paper PhantomWiki is accepted to ICML 2025
@icmlconf.bsky.social

OG: bsky.app/profile/anmo...
🧑‍💻We designed it as a future-proof LLM reasoning benchmark. And it shows: new Qwen3-32B model with auto-thinking-mode struggles with higher difficulty questions, like DeepSeek-R1 from Jan

June 10, 2025 at 7:47 PM

Anmol Kabra

@anmolkabra.com

🎉 PhantomWiki is accepted to the @iclr-conf.bsky.social DATA-FM workshop! Come chat with us in Singapore 🦁

🧠 The reasoning + retrieval benchmark comes right on the heels of new @realaaai.bsky.social presidential report: AI Reasoning and Agents research front and center!

Anmol Kabra @anmolkabra.com · Mar 5

🚀 📢 Releasing PhantomWiki, a reasoning + retrieval benchmark for LLM agents!

If I asked you "Who is the friend of father of mother of Tom?", you'd simply look up Tom -> mother -> father -> friend and answer.

🤯 SOTA LLMs, even DeepSeek-R1, struggle with such simple reasoning!

March 6, 2025 at 1:38 PM

Anmol Kabra

@anmolkabra.com

🚀 📢 Releasing PhantomWiki, a reasoning + retrieval benchmark for LLM agents!

If I asked you "Who is the friend of father of mother of Tom?", you'd simply look up Tom -> mother -> father -> friend and answer.

🤯 SOTA LLMs, even DeepSeek-R1, struggle with such simple reasoning!

March 5, 2025 at 8:05 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news