Seth Karten
sethkarten.ai
Seth Karten
@sethkarten.ai
Autonomous Agents | PhD @ Princeton | Prev: CMU, Waymo | NSF GRFP Fellow
Pinned
🚀 New preprint!
🤔 Can one agent “nudge” a synthetic civilization of Census‑grounded agents toward higher social welfare—all by optimizing utilities in‑context? Meet the LLM Economist ↓
Every LLM eval uses Bradley-Terry Elo rankings. Almost none report uncertainty. Should we trust them? Maybe there is something better... 👇

(1/5)
October 20, 2025 at 3:50 AM
Pokemon is truly the pareto frontier of agent research
- The RPG requires an autonomous embodied agentic agent with perception, planning, memory, and control
- VGC and Gen 9 OU penalize erroneous actions with fast-paced opponent-modeling in short games
(1/3)
October 15, 2025 at 5:50 PM
Trying to get a post ready but bluesky won’t let me post on desktop!!! If you want users here you need a user experience!!!
October 15, 2025 at 5:34 PM
You probably aren’t reading enough papers.
You probably didn’t cite the 10 closest papers to your work
Thus, LLMs probably have a better understanding of where your paper sits in the literature ¯\_(ツ)_/¯
October 8, 2025 at 6:21 PM
The most interesting papers arent being published at the “prestigious” venues anymore. Where are you publishing and what do you work on?
September 24, 2025 at 12:56 PM
🚨 Hackathon Weekend! 🚨

Jumpstart your PokéAgent Challenge submission ahead of NeurIPS!

📅 Sept 13–14
✅ Leaderboards reset Sat 10AM EDT
🎙️ Lightning talks in LLMs, RL, and Pokemon
💬 Live Office hours
🏆 $2k in prizes
September 2, 2025 at 1:44 PM
Reposted by Seth Karten
The NeurIPS 2025 PokéAgent Challenge is offering compute credits, courtesy of our sponsor Google DeepMind, to help you train bigger models & run more experiments.

📌 To apply:
1️⃣ Make a submission to Track 1 or 2 at pokeagent.github.io
2️⃣ Fill out the compute credit form on the site
PokéAgent Challenge - NeurIPS 2025
pokeagent.github.io
August 15, 2025 at 12:07 AM
Mad about data centers? Call your reps to build more nuclear
August 18, 2025 at 4:14 AM
Hey #academics

Why are neurips workshop deadlines due a month before main track acceptances? Seems counterintuitive to have the two tracks compete with each other

#machinelearning
August 15, 2025 at 4:03 AM
The NeurIPS 2025 PokéAgent Challenge is offering compute credits, courtesy of our sponsor Google DeepMind, to help you train bigger models & run more experiments.

📌 To apply:
1️⃣ Make a submission to Track 1 or 2 at pokeagent.github.io
2️⃣ Fill out the compute credit form on the site
PokéAgent Challenge - NeurIPS 2025
pokeagent.github.io
August 15, 2025 at 12:07 AM
If your final product doesnt reason in-context, how is it supposed to meta-learn and address distribution shifts and environment changes?
August 12, 2025 at 7:14 PM
Papers are dead. Maybe it is time to start the youtube channel…
August 12, 2025 at 6:43 AM
Viral paper out today about predicting brain stimulus from video inputs. as always dont overfit on first order responses. If you oversaturate stimulus, people will stop using the product(people uninstalling IG because it is too addicting) The attention economy must be modeled as a multi-agent system
August 12, 2025 at 5:49 AM
Reposted by Seth Karten
🚀 New preprint!
🤔 Can one agent “nudge” a synthetic civilization of Census‑grounded agents toward higher social welfare—all by optimizing utilities in‑context? Meet the LLM Economist ↓
July 23, 2025 at 5:30 PM
🚀 New preprint!
🤔 Can one agent “nudge” a synthetic civilization of Census‑grounded agents toward higher social welfare—all by optimizing utilities in‑context? Meet the LLM Economist ↓
July 23, 2025 at 5:30 PM
🚀 Launch day! The NeurIPS 2025 PokéAgent Challenge is live. @neuripsconf.bsky.social
Two tracks:
① Showdown Battling – imperfect-info, turn-based strategy
② Pokemon Emerald Speedrunning – long horizon RPG planning
5 M labeled replays • starter kit • baselines.
Bring your LLM, RL, or hybrid agent!
July 14, 2025 at 4:33 PM
🚀 5 days until my ICML spotlight poster!
Key insights we’ll unpack:
• Base LLM + test-time planning
• Game-theoretic scaffolding
• Context-engineered opponent prediction
• Comparative LLM-as-judge (relative > absolute)

Catch me Thu Jul 17, 4:30-7 PM PT👇
July 12, 2025 at 6:12 PM
Heading to #ICML2025 next week! If you’re into all things API (Artificial Pokémon Intelligence) from our PokéChamp spotlight to the upcoming NeurIPS PokeAgent Challenge, LLM-agent scaffolding & reasoning, or mechanism-design nudging, let’s connect. DMs open!
July 9, 2025 at 5:06 PM
Reposted by Seth Karten
Also the Pokemon Agent challenge by @sethkarten.ai @stephmilani.bsky.social and others!

pokeagent.github.io
June 28, 2025 at 7:31 AM
Social media takeoff is hard. Bluesky still lacks the capability to compete with twitter
June 4, 2025 at 5:43 PM
Excited to announce that I will be spending the summer at @Waymo on the simulation realism team! I’ll be working on learning to generate simulated worlds.
🚙🚙🚙
Send me a message if youre in the bay area and want to chat!
May 30, 2025 at 4:42 PM
Excited to share that the PokeAgent challenge was accepted as a NeurIPS competition!

This should serve as an excellent benchmark for competitive games AND ‘speedrunning’ the RPG. I hope to see both the RL and LLM agent communities working together here to eval agents in Pokemon

More info soon👀
May 26, 2025 at 7:55 PM
Scaffolding is a really bad term. It is amazing how we have gone so end-to-end that we cant imagine when the model was just one component in an architecture
May 16, 2025 at 4:54 PM
Researchers need to stop working on low-hanging fruit. Leave that for the engineers. Your job is difficult answering questions that people will push back on
May 6, 2025 at 7:54 PM
Wow, this is officially an ICML spotlight! See you in Vancouver :)
Can a Large Language Model (LLM) with zero Pokémon-specific training achieve expert-level performance in competitive Pokémon battles?
Introducing PokéChamp, our minimax LLM agent that reaches top 30%-10% human-level Elo on Pokémon Showdown!
New paper on arXiv and code on github!
May 1, 2025 at 4:48 PM