Anmol Kabra
anmolkabra.com
Anmol Kabra
@anmolkabra.com
anmolkabra.com

ML PhD at @cornellbowers.bsky.social: LLM reasoning, agents, and AI for Science. Can cycle, run, juggle. Currently trying combinations.
Pinned
🚀 📢 Releasing PhantomWiki, a reasoning + retrieval benchmark for LLM agents!

If I asked you "Who is the friend of father of mother of Tom?", you'd simply look up Tom -> mother -> father -> friend and answer.

🤯 SOTA LLMs, even DeepSeek-R1, struggle with such simple reasoning!
Presenting PhantomWiki with @albertgong.bsky.social and Johann at @icmlconf.bsky.social on Tuesday 11am + an oral talk at Long Context Workshop on Saturday! Come say hi/chat about LLM reasoning and retrieval evaluation!
July 14, 2025 at 9:33 PM
🚨 Our paper PhantomWiki is accepted to ICML 2025
@icmlconf.bsky.social

OG: bsky.app/profile/anmo...
🧑‍💻We designed it as a future-proof LLM reasoning benchmark. And it shows: new Qwen3-32B model with auto-thinking-mode struggles with higher difficulty questions, like DeepSeek-R1 from Jan
June 10, 2025 at 7:47 PM
🎉 PhantomWiki is accepted to the @iclr-conf.bsky.social DATA-FM workshop! Come chat with us in Singapore 🦁

🧠 The reasoning + retrieval benchmark comes right on the heels of new @realaaai.bsky.social presidential report: AI Reasoning and Agents research front and center!
🚀 📢 Releasing PhantomWiki, a reasoning + retrieval benchmark for LLM agents!

If I asked you "Who is the friend of father of mother of Tom?", you'd simply look up Tom -> mother -> father -> friend and answer.

🤯 SOTA LLMs, even DeepSeek-R1, struggle with such simple reasoning!
March 6, 2025 at 1:38 PM
🚀 📢 Releasing PhantomWiki, a reasoning + retrieval benchmark for LLM agents!

If I asked you "Who is the friend of father of mother of Tom?", you'd simply look up Tom -> mother -> father -> friend and answer.

🤯 SOTA LLMs, even DeepSeek-R1, struggle with such simple reasoning!
March 5, 2025 at 8:05 PM