Anmol Kabra
anmolkabra.com
Anmol Kabra
@anmolkabra.com
anmolkabra.com

ML PhD at @cornellbowers.bsky.social: LLM reasoning, agents, and AI for Science. Can cycle, run, juggle. Currently trying combinations.
PhantomEval---evaluator code for PhantomWiki---is some of the most stable code I wrote this yr with @albertgong.bsky.social Chao and @kamile.st. It supports LLMs through all major providers (openai, anthropic, gemini, llama, together, vllm) ==> we eval new LLM releases within days! 🚀
June 10, 2025 at 7:47 PM
PhantomWiki v1 release on github generates on-demand datasets for LLM reasoning+retrieval evaluation github.com/kilian-group...
GitHub - kilian-group/phantom-wiki: Python package for generating datasets to evaluate reasoning and retrieval of large language models
Python package for generating datasets to evaluate reasoning and retrieval of large language models - kilian-group/phantom-wiki
github.com
June 10, 2025 at 7:47 PM
Brilliant work at @cornelluniversity.bsky.social with @albertgong.bsky.social, Chao, @kamile.st, Raphael, Johann, JT, Carla Gomes, and @kilianqw.bsky.social!

Paper on arxiv: 📄 arxiv.org/abs/2502.20377
March 5, 2025 at 8:05 PM
Everything is open-source 📖 and easy 🍰. Check it out today github.com/kilian-group... or with "pip install phantom-wiki"

PhantomWiki is the first suite to **quantify** LLM reasoning and retrieval. It is _the_ durable evaluation benchmark we need for the next-generation of LLMs!
GitHub - kilian-group/phantom-wiki: Python package for generating datasets to evaluate reasoning and retrieval
Python package for generating datasets to evaluate reasoning and retrieval - kilian-group/phantom-wiki
github.com
March 5, 2025 at 8:05 PM
🚄 All at a click of a button. On any laptop. In seconds.

📈 PhantomWiki scales amazingly. In just 3 secs, we can generate 1K wiki pages, going beyond SOTA LLM 128K token limits. And in hours, Wikipedia-scale 1 million pages!
March 5, 2025 at 8:05 PM
PhantomWiki generates datasets of wiki pages and reasoning questions about the universe of people, on the scale of Wikipedia 🌐

🚨The universe of people and their relationships are generated randomly. So by construction, LLMs cannot memorize/cheat on PhantomWiki evaluation.
March 5, 2025 at 8:05 PM