🤔 Can one agent “nudge” a synthetic civilization of Census‑grounded agents toward higher social welfare—all by optimizing utilities in‑context? Meet the LLM Economist ↓
(1/5)
(1/5)
- The RPG requires an autonomous embodied agentic agent with perception, planning, memory, and control
- VGC and Gen 9 OU penalize erroneous actions with fast-paced opponent-modeling in short games
(1/3)
- The RPG requires an autonomous embodied agentic agent with perception, planning, memory, and control
- VGC and Gen 9 OU penalize erroneous actions with fast-paced opponent-modeling in short games
(1/3)
You probably didn’t cite the 10 closest papers to your work
Thus, LLMs probably have a better understanding of where your paper sits in the literature ¯\_(ツ)_/¯
You probably didn’t cite the 10 closest papers to your work
Thus, LLMs probably have a better understanding of where your paper sits in the literature ¯\_(ツ)_/¯
Jumpstart your PokéAgent Challenge submission ahead of NeurIPS!
📅 Sept 13–14
✅ Leaderboards reset Sat 10AM EDT
🎙️ Lightning talks in LLMs, RL, and Pokemon
💬 Live Office hours
🏆 $2k in prizes
Jumpstart your PokéAgent Challenge submission ahead of NeurIPS!
📅 Sept 13–14
✅ Leaderboards reset Sat 10AM EDT
🎙️ Lightning talks in LLMs, RL, and Pokemon
💬 Live Office hours
🏆 $2k in prizes
📌 To apply:
1️⃣ Make a submission to Track 1 or 2 at pokeagent.github.io
2️⃣ Fill out the compute credit form on the site
📌 To apply:
1️⃣ Make a submission to Track 1 or 2 at pokeagent.github.io
2️⃣ Fill out the compute credit form on the site
Why are neurips workshop deadlines due a month before main track acceptances? Seems counterintuitive to have the two tracks compete with each other
#machinelearning
Why are neurips workshop deadlines due a month before main track acceptances? Seems counterintuitive to have the two tracks compete with each other
#machinelearning
📌 To apply:
1️⃣ Make a submission to Track 1 or 2 at pokeagent.github.io
2️⃣ Fill out the compute credit form on the site
📌 To apply:
1️⃣ Make a submission to Track 1 or 2 at pokeagent.github.io
2️⃣ Fill out the compute credit form on the site
🤔 Can one agent “nudge” a synthetic civilization of Census‑grounded agents toward higher social welfare—all by optimizing utilities in‑context? Meet the LLM Economist ↓
🤔 Can one agent “nudge” a synthetic civilization of Census‑grounded agents toward higher social welfare—all by optimizing utilities in‑context? Meet the LLM Economist ↓
🤔 Can one agent “nudge” a synthetic civilization of Census‑grounded agents toward higher social welfare—all by optimizing utilities in‑context? Meet the LLM Economist ↓
🤔 Can one agent “nudge” a synthetic civilization of Census‑grounded agents toward higher social welfare—all by optimizing utilities in‑context? Meet the LLM Economist ↓
Two tracks:
① Showdown Battling – imperfect-info, turn-based strategy
② Pokemon Emerald Speedrunning – long horizon RPG planning
5 M labeled replays • starter kit • baselines.
Bring your LLM, RL, or hybrid agent!
Two tracks:
① Showdown Battling – imperfect-info, turn-based strategy
② Pokemon Emerald Speedrunning – long horizon RPG planning
5 M labeled replays • starter kit • baselines.
Bring your LLM, RL, or hybrid agent!
Key insights we’ll unpack:
• Base LLM + test-time planning
• Game-theoretic scaffolding
• Context-engineered opponent prediction
• Comparative LLM-as-judge (relative > absolute)
Catch me Thu Jul 17, 4:30-7 PM PT👇
Key insights we’ll unpack:
• Base LLM + test-time planning
• Game-theoretic scaffolding
• Context-engineered opponent prediction
• Comparative LLM-as-judge (relative > absolute)
Catch me Thu Jul 17, 4:30-7 PM PT👇
pokeagent.github.io
pokeagent.github.io
🚙🚙🚙
Send me a message if youre in the bay area and want to chat!
🚙🚙🚙
Send me a message if youre in the bay area and want to chat!
This should serve as an excellent benchmark for competitive games AND ‘speedrunning’ the RPG. I hope to see both the RL and LLM agent communities working together here to eval agents in Pokemon
More info soon👀
This should serve as an excellent benchmark for competitive games AND ‘speedrunning’ the RPG. I hope to see both the RL and LLM agent communities working together here to eval agents in Pokemon
More info soon👀
Introducing PokéChamp, our minimax LLM agent that reaches top 30%-10% human-level Elo on Pokémon Showdown!
New paper on arXiv and code on github!