Lightnews — Scholar-powered news

Anmol Kabra

@anmolkabra.com

anmolkabra.com

ML PhD at @cornellbowers.bsky.social: LLM reasoning, agents, and AI for Science. Can cycle, run, juggle. Currently trying combinations.

Posts Replies Media Videos

Anmol Kabra

@anmolkabra.com

PhantomEval---evaluator code for PhantomWiki---is some of the most stable code I wrote this yr with @albertgong.bsky.social Chao and @kamile.st. It supports LLMs through all major providers (openai, anthropic, gemini, llama, together, vllm) ==> we eval new LLM releases within days! 🚀

June 10, 2025 at 7:47 PM

Anmol Kabra

@anmolkabra.com

PhantomWiki v1 release on github generates on-demand datasets for LLM reasoning+retrieval evaluation github.com/kilian-group...

GitHub - kilian-group/phantom-wiki: Python package for generating datasets to evaluate reasoning and retrieval of large language models

Python package for generating datasets to evaluate reasoning and retrieval of large language models - kilian-group/phantom-wiki

github.com

June 10, 2025 at 7:47 PM

Anmol Kabra

@anmolkabra.com

Brilliant work at @cornelluniversity.bsky.social with @albertgong.bsky.social, Chao, @kamile.st, Raphael, Johann, JT, Carla Gomes, and @kilianqw.bsky.social!

Paper on arxiv: 📄 arxiv.org/abs/2502.20377

March 5, 2025 at 8:05 PM

Anmol Kabra

@anmolkabra.com

Everything is open-source 📖 and easy 🍰. Check it out today github.com/kilian-group... or with "pip install phantom-wiki"

PhantomWiki is the first suite to **quantify** LLM reasoning and retrieval. It is _the_ durable evaluation benchmark we need for the next-generation of LLMs!

GitHub - kilian-group/phantom-wiki: Python package for generating datasets to evaluate reasoning and retrieval

Python package for generating datasets to evaluate reasoning and retrieval - kilian-group/phantom-wiki

github.com

March 5, 2025 at 8:05 PM

Anmol Kabra

@anmolkabra.com

🚄 All at a click of a button. On any laptop. In seconds.

📈 PhantomWiki scales amazingly. In just 3 secs, we can generate 1K wiki pages, going beyond SOTA LLM 128K token limits. And in hours, Wikipedia-scale 1 million pages!

March 5, 2025 at 8:05 PM

Anmol Kabra

@anmolkabra.com

PhantomWiki generates datasets of wiki pages and reasoning questions about the universe of people, on the scale of Wikipedia 🌐

🚨The universe of people and their relationships are generated randomly. So by construction, LLMs cannot memorize/cheat on PhantomWiki evaluation.

March 5, 2025 at 8:05 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news