Lightnews — Scholar-powered news

Anmol Kabra

@anmolkabra.com

anmolkabra.com

ML PhD at @cornellbowers.bsky.social: LLM reasoning, agents, and AI for Science. Can cycle, run, juggle. Currently trying combinations.

Posts Replies Media Videos

Anmol Kabra

@anmolkabra.com

Presenting PhantomWiki with @albertgong.bsky.social and Johann at @icmlconf.bsky.social on Tuesday 11am + an oral talk at Long Context Workshop on Saturday! Come say hi/chat about LLM reasoning and retrieval evaluation!

July 14, 2025 at 9:33 PM

Anmol Kabra

@anmolkabra.com

🚨 Our paper PhantomWiki is accepted to ICML 2025
@icmlconf.bsky.social

OG: bsky.app/profile/anmo...
🧑‍💻We designed it as a future-proof LLM reasoning benchmark. And it shows: new Qwen3-32B model with auto-thinking-mode struggles with higher difficulty questions, like DeepSeek-R1 from Jan

June 10, 2025 at 7:47 PM

Anmol Kabra

@anmolkabra.com

Brilliant work at @cornelluniversity.bsky.social with @albertgong.bsky.social, Chao, @kamile.st, Raphael, Johann, JT, Carla Gomes, and @kilianqw.bsky.social!

Paper on arxiv: 📄 arxiv.org/abs/2502.20377

March 5, 2025 at 8:05 PM

Anmol Kabra

@anmolkabra.com

PhantomWiki generates datasets of wiki pages and reasoning questions about the universe of people, on the scale of Wikipedia 🌐

🚨The universe of people and their relationships are generated randomly. So by construction, LLMs cannot memorize/cheat on PhantomWiki evaluation.

March 5, 2025 at 8:05 PM

Anmol Kabra

@anmolkabra.com

🚀 📢 Releasing PhantomWiki, a reasoning + retrieval benchmark for LLM agents!

If I asked you "Who is the friend of father of mother of Tom?", you'd simply look up Tom -> mother -> father -> friend and answer.

🤯 SOTA LLMs, even DeepSeek-R1, struggle with such simple reasoning!

March 5, 2025 at 8:05 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news