Kamilė Stankevičiūtė
kamile.st
Kamilė Stankevičiūtė
@kamile.st
Machine Learning PhD student at Cambridge University, visiting Cornell. Previously at Oxford and Google. Principled ML & applications in medicine.
PhantomWiki has just been accepted at #ICLR2025 DATA-FM workshop! 🎉
🎉 PhantomWiki is accepted to the @iclr-conf.bsky.social DATA-FM workshop! Come chat with us in Singapore 🦁

🧠 The reasoning + retrieval benchmark comes right on the heels of new @realaaai.bsky.social presidential report: AI Reasoning and Agents research front and center!
🚀 📢 Releasing PhantomWiki, a reasoning + retrieval benchmark for LLM agents!

If I asked you "Who is the friend of father of mother of Tom?", you'd simply look up Tom -> mother -> father -> friend and answer.

🤯 SOTA LLMs, even DeepSeek-R1, struggle with such simple reasoning!
March 6, 2025 at 6:19 PM
Reposted by Kamilė Stankevičiūtė
🎉 PhantomWiki is accepted to the @iclr-conf.bsky.social DATA-FM workshop! Come chat with us in Singapore 🦁

🧠 The reasoning + retrieval benchmark comes right on the heels of new @realaaai.bsky.social presidential report: AI Reasoning and Agents research front and center!
🚀 📢 Releasing PhantomWiki, a reasoning + retrieval benchmark for LLM agents!

If I asked you "Who is the friend of father of mother of Tom?", you'd simply look up Tom -> mother -> father -> friend and answer.

🤯 SOTA LLMs, even DeepSeek-R1, struggle with such simple reasoning!
March 6, 2025 at 1:38 PM
Excited to announce our new work on reasoning and retrieval evaluation in LLMs! 😊

Check it out!
🚀 📢 Releasing PhantomWiki, a reasoning + retrieval benchmark for LLM agents!

If I asked you "Who is the friend of father of mother of Tom?", you'd simply look up Tom -> mother -> father -> friend and answer.

🤯 SOTA LLMs, even DeepSeek-R1, struggle with such simple reasoning!
March 6, 2025 at 3:05 AM
Reposted by Kamilė Stankevičiūtė
New preprint!

This is a hardcore technical paper on Thompson sampling - as a strategy for the so-called online learning game.

I think it's one of the most long-term important things I have ever worked on due to what it makes possible.

That needs explaining: thread below!

arxiv.org/abs/2502.14790
An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces
We develop an analysis of Thompson sampling for online learning under full feedback - also known as prediction with expert advice - where the learner's prior is defined over the space of an adversary'...
arxiv.org
February 21, 2025 at 8:57 PM