🧠 The reasoning + retrieval benchmark comes right on the heels of new @realaaai.bsky.social presidential report: AI Reasoning and Agents research front and center!
If I asked you "Who is the friend of father of mother of Tom?", you'd simply look up Tom -> mother -> father -> friend and answer.
🤯 SOTA LLMs, even DeepSeek-R1, struggle with such simple reasoning!
🧠 The reasoning + retrieval benchmark comes right on the heels of new @realaaai.bsky.social presidential report: AI Reasoning and Agents research front and center!
If I asked you "Who is the friend of father of mother of Tom?", you'd simply look up Tom -> mother -> father -> friend and answer.
🤯 SOTA LLMs, even DeepSeek-R1, struggle with such simple reasoning!
🧠 The reasoning + retrieval benchmark comes right on the heels of new @realaaai.bsky.social presidential report: AI Reasoning and Agents research front and center!
Check it out!
If I asked you "Who is the friend of father of mother of Tom?", you'd simply look up Tom -> mother -> father -> friend and answer.
🤯 SOTA LLMs, even DeepSeek-R1, struggle with such simple reasoning!
Check it out!
This is a hardcore technical paper on Thompson sampling - as a strategy for the so-called online learning game.
I think it's one of the most long-term important things I have ever worked on due to what it makes possible.
That needs explaining: thread below!
arxiv.org/abs/2502.14790
This is a hardcore technical paper on Thompson sampling - as a strategy for the so-called online learning game.
I think it's one of the most long-term important things I have ever worked on due to what it makes possible.
That needs explaining: thread below!
arxiv.org/abs/2502.14790