We're looking for the best community-created benchmarks that propose new games or dynamic tests for LLMs to feature for this week’s #TaskTuesday!
We're looking for the best community-created benchmarks that propose new games or dynamic tests for LLMs to feature for this week’s #TaskTuesday!
We watched AI models navigate strategic reasoning, social deduction, and risk management through three different games.
🏆The Champions:
🃏Poker: GPT 5.2
🐺 Werewolf: Gemini 3 Pro Preview
♟️Chess: Gemini 3 Pro Preview
We watched AI models navigate strategic reasoning, social deduction, and risk management through three different games.
🏆The Champions:
🃏Poker: GPT 5.2
🐺 Werewolf: Gemini 3 Pro Preview
♟️Chess: Gemini 3 Pro Preview
A huge thank you to everyone who tuned in and to our amazing partners @gmhikaru.bsky.social Nick Schulman, Liv Boeree, @dougpolkvids for the fantastic commentary and analysis across all three games, Poker, Chess and Werewolf.
A huge thank you to everyone who tuned in and to our amazing partners @gmhikaru.bsky.social Nick Schulman, Liv Boeree, @dougpolkvids for the fantastic commentary and analysis across all three games, Poker, Chess and Werewolf.
Join @gmhikaru.bsky.social and Nick Schulman now! 👇
www.youtube.com/watch?v=vzMj...
Join @gmhikaru.bsky.social and Nick Schulman now! 👇
www.youtube.com/watch?v=vzMj...
Who takes the title in Poker, Chess and Werewolf? Grab your seat for the ultimate AI showdown: 👇
www.youtube.com/watch?v=vzMj...
Who takes the title in Poker, Chess and Werewolf? Grab your seat for the ultimate AI showdown: 👇
www.youtube.com/watch?v=vzMj...
What happens when a chess Grandmaster and a Poker legend analyze AI? ♟️🃏
What happens when a chess Grandmaster and a Poker legend analyze AI? ♟️🃏
Join @gmhikaru.bsky.social and Nick Schulman as they break down every bluff, blunder and brilliant move from our final four models. You don’t want to miss this co-hosted deep dive.
www.youtube.com/watch?v=4TJw...
Join @gmhikaru.bsky.social and Nick Schulman as they break down every bluff, blunder and brilliant move from our final four models. You don’t want to miss this co-hosted deep dive.
www.youtube.com/watch?v=4TJw...
Congratulations to our AI poker showdown semi-finalists o3, Gemini 3 Flash, GPT 5.2, and Opus 4.5!
Congratulations to our AI poker showdown semi-finalists o3, Gemini 3 Flash, GPT 5.2, and Opus 4.5!
Top AI models compete in Poker, Werewolf, and Chess, testing reasoning, social strategy, and risk management.
🎙️ Co-hosted by GM Hikaru & Poker Hall-of-Famer Nick Schulman: www.youtube.com/GMHikaru
🗓️ Feb 2–4 | 9:30–11:30 AM PT
More info 👇
Top AI models compete in Poker, Werewolf, and Chess, testing reasoning, social strategy, and risk management.
🎙️ Co-hosted by GM Hikaru & Poker Hall-of-Famer Nick Schulman: www.youtube.com/GMHikaru
🗓️ Feb 2–4 | 9:30–11:30 AM PT
More info 👇
We are releasing two new games, Poker and Werewolf, along with an updated Chess leaderboard next Monday, February 2, running daily from 9:30 AM PT to 11:30 AM PT through February 4
We are releasing two new games, Poker and Werewolf, along with an updated Chess leaderboard next Monday, February 2, running daily from 9:30 AM PT to 11:30 AM PT through February 4
We just launched Community Benchmarks! Build, run and share AI benchmarks on top models - fully transparent and reproducible.
Learn more 👇
blog.google/innovation-a...
We just launched Community Benchmarks! Build, run and share AI benchmarks on top models - fully transparent and reproducible.
Learn more 👇
blog.google/innovation-a...
www.kaggle.com/discussions/...
www.kaggle.com/discussions/...
As AI evolves at an unprecedented pace, measuring intelligence requires more than a few AI research labs alone – it requires the imagination and collective expertise of the global community. That’s why we’re launching Community Benchmarks.
As AI evolves at an unprecedented pace, measuring intelligence requires more than a few AI research labs alone – it requires the imagination and collective expertise of the global community. That’s why we’re launching Community Benchmarks.
🎯Build human-centered AI applications by using MedGemma and other open models
💰 $100,000 Prize Pool
⏰ Final Submission: Feb 24, 2026
www.kaggle.com/competitions...
🎯Build human-centered AI applications by using MedGemma and other open models
💰 $100,000 Prize Pool
⏰ Final Submission: Feb 24, 2026
www.kaggle.com/competitions...
🎯 To predict the 3D structure of RNA molecules using their sequences
💰 $75,000 Prize Pool
⏰ Entry Deadline: March 18, 2026
kaggle.com/competitions/stanford-rna-3d-folding-2
🎯 To predict the 3D structure of RNA molecules using their sequences
💰 $75,000 Prize Pool
⏰ Entry Deadline: March 18, 2026
kaggle.com/competitions/stanford-rna-3d-folding-2
We're excited to announce the top 12 teams who showcased exceptional creativity & technical skill using AI agents! Check out their innovative projects & learn more about their submissions here:
www.kaggle.com/competitions...
We're excited to announce the top 12 teams who showcased exceptional creativity & technical skill using AI agents! Check out their innovative projects & learn more about their submissions here:
www.kaggle.com/competitions...
With Gemini 3 Flash ⚡️, we are seeing reasoning capabilities previously reserved for our largest models. This opens up entirely new categories of near real-time applications that require complex thought.
More in thread ⬇️
With Gemini 3 Flash ⚡️, we are seeing reasoning capabilities previously reserved for our largest models. This opens up entirely new categories of near real-time applications that require complex thought.
More in thread ⬇️
🎯 Build an AI model that translates 4,000-year-old Old Assyrian business records into English
💰 $50,000 Prize Pool
⏰ Entry Deadline: March 23, 2026
www.kaggle.com/competitions...
🎯 Build an AI model that translates 4,000-year-old Old Assyrian business records into English
💰 $50,000 Prize Pool
⏰ Entry Deadline: March 23, 2026
www.kaggle.com/competitions...
This benchmark focuses on complex web research tasks and tests agent comprehensiveness.
Check the leaderboard: www.kaggle.com/benchmarks/g...
This benchmark focuses on complex web research tasks and tests agent comprehensiveness.
Check the leaderboard: www.kaggle.com/benchmarks/g...
Developed by Google DeepMind and Google Research, this suite measures LLM factuality across four dimensions: Parametric knowledge, Search, Multimodal understanding & Grounding.
Explore the leaderboard: www.kaggle.com/benchmarks/g...
Developed by Google DeepMind and Google Research, this suite measures LLM factuality across four dimensions: Parametric knowledge, Search, Multimodal understanding & Grounding.
Explore the leaderboard: www.kaggle.com/benchmarks/g...
Developed by Google DeepMind, this benchmark spans 29 Indic languages, including first-ever evaluation data for 18 Indic languages. It supports language tasks like summarization, translation and question answering.
Developed by Google DeepMind, this benchmark spans 29 Indic languages, including first-ever evaluation data for 18 Indic languages. It supports language tasks like summarization, translation and question answering.
🎯Build real-world AI apps using Gemini 3 Pro in Google AI Studio
💰 Prize Pool: $500,000 in Credits
⏰ Hackathon Timeline: Dec 5 - 12, 2025 (now extended!)
www.kaggle.com/competitions...
🎯Build real-world AI apps using Gemini 3 Pro in Google AI Studio
💰 Prize Pool: $500,000 in Credits
⏰ Hackathon Timeline: Dec 5 - 12, 2025 (now extended!)
www.kaggle.com/competitions...