ML PhD at @cornellbowers.bsky.social: LLM reasoning, agents, and AI for Science. Can cycle, run, juggle. Currently trying combinations.
If I asked you "Who is the friend of father of mother of Tom?", you'd simply look up Tom -> mother -> father -> friend and answer.
🤯 SOTA LLMs, even DeepSeek-R1, struggle with such simple reasoning!
@icmlconf.bsky.social
OG: bsky.app/profile/anmo...
🧑💻We designed it as a future-proof LLM reasoning benchmark. And it shows: new Qwen3-32B model with auto-thinking-mode struggles with higher difficulty questions, like DeepSeek-R1 from Jan
@icmlconf.bsky.social
OG: bsky.app/profile/anmo...
🧑💻We designed it as a future-proof LLM reasoning benchmark. And it shows: new Qwen3-32B model with auto-thinking-mode struggles with higher difficulty questions, like DeepSeek-R1 from Jan
🧠 The reasoning + retrieval benchmark comes right on the heels of new @realaaai.bsky.social presidential report: AI Reasoning and Agents research front and center!
If I asked you "Who is the friend of father of mother of Tom?", you'd simply look up Tom -> mother -> father -> friend and answer.
🤯 SOTA LLMs, even DeepSeek-R1, struggle with such simple reasoning!
🧠 The reasoning + retrieval benchmark comes right on the heels of new @realaaai.bsky.social presidential report: AI Reasoning and Agents research front and center!
If I asked you "Who is the friend of father of mother of Tom?", you'd simply look up Tom -> mother -> father -> friend and answer.
🤯 SOTA LLMs, even DeepSeek-R1, struggle with such simple reasoning!
If I asked you "Who is the friend of father of mother of Tom?", you'd simply look up Tom -> mother -> father -> friend and answer.
🤯 SOTA LLMs, even DeepSeek-R1, struggle with such simple reasoning!