Lightnews — Scholar-powered news

Seth Karten

@sethkarten.ai

Every LLM eval uses Bradley-Terry Elo rankings. Almost none report uncertainty. Should we trust them? Maybe there is something better... 👇

(1/5)

October 20, 2025 at 3:50 AM

Seth Karten

@sethkarten.ai

Pokemon is truly the pareto frontier of agent research
- The RPG requires an autonomous embodied agentic agent with perception, planning, memory, and control
- VGC and Gen 9 OU penalize erroneous actions with fast-paced opponent-modeling in short games
(1/3)

October 15, 2025 at 5:50 PM

Seth Karten

@sethkarten.ai

Trying to get a post ready but bluesky won’t let me post on desktop!!! If you want users here you need a user experience!!!

October 15, 2025 at 5:34 PM

Seth Karten

@sethkarten.ai

You probably aren’t reading enough papers.
You probably didn’t cite the 10 closest papers to your work
Thus, LLMs probably have a better understanding of where your paper sits in the literature ¯\_(ツ)_/¯

October 8, 2025 at 6:21 PM

Seth Karten

@sethkarten.ai

The most interesting papers arent being published at the “prestigious” venues anymore. Where are you publishing and what do you work on?

September 24, 2025 at 12:56 PM

Seth Karten

@sethkarten.ai

🚨 Hackathon Weekend! 🚨

Jumpstart your PokéAgent Challenge submission ahead of NeurIPS!

📅 Sept 13–14
✅ Leaderboards reset Sat 10AM EDT
🎙️ Lightning talks in LLMs, RL, and Pokemon
💬 Live Office hours
🏆 $2k in prizes

PokéAgent Challenge @ NeurIPS 2025 Hackathon Weekend Schedule. Saturday, Sept 13th: 10 AM leaderboards reset; 12–1:30 PM livestream talks (overview, Aaron Traylor on Pokémon as an AI Problem, Seth Karten on Pokéchamp, Jake Grigsby on Metamon, plus more). Sunday, Sept 14th: 1–3:30 PM organizer office hours; 11:59 PM top teams earn up to $2k in GCP credits. Sponsored by Google DeepMind and AIJ.

September 2, 2025 at 1:44 PM

Reposted by Seth Karten

Seth Karten

@sethkarten.ai

The NeurIPS 2025 PokéAgent Challenge is offering compute credits, courtesy of our sponsor Google DeepMind, to help you train bigger models & run more experiments.

📌 To apply:
1️⃣ Make a submission to Track 1 or 2 at pokeagent.github.io
2️⃣ Fill out the compute credit form on the site

PokéAgent Challenge - NeurIPS 2025

pokeagent.github.io

August 15, 2025 at 12:07 AM

Seth Karten

@sethkarten.ai

Mad about data centers? Call your reps to build more nuclear

August 18, 2025 at 4:14 AM

Seth Karten

@sethkarten.ai

Hey #academics

Why are neurips workshop deadlines due a month before main track acceptances? Seems counterintuitive to have the two tracks compete with each other

#machinelearning

August 15, 2025 at 4:03 AM

Seth Karten

@sethkarten.ai

The NeurIPS 2025 PokéAgent Challenge is offering compute credits, courtesy of our sponsor Google DeepMind, to help you train bigger models & run more experiments.

📌 To apply:
1️⃣ Make a submission to Track 1 or 2 at pokeagent.github.io
2️⃣ Fill out the compute credit form on the site

PokéAgent Challenge - NeurIPS 2025

pokeagent.github.io

August 15, 2025 at 12:07 AM

Seth Karten

@sethkarten.ai

If your final product doesnt reason in-context, how is it supposed to meta-learn and address distribution shifts and environment changes?

August 12, 2025 at 7:14 PM

Seth Karten

@sethkarten.ai

Papers are dead. Maybe it is time to start the youtube channel…

August 12, 2025 at 6:43 AM

Seth Karten

@sethkarten.ai

Viral paper out today about predicting brain stimulus from video inputs. as always dont overfit on first order responses. If you oversaturate stimulus, people will stop using the product(people uninstalling IG because it is too addicting) The attention economy must be modeled as a multi-agent system

August 12, 2025 at 5:49 AM

Reposted by Seth Karten

Seth Karten

@sethkarten.ai

🚀 New preprint!
🤔 Can one agent “nudge” a synthetic civilization of Census‑grounded agents toward higher social welfare—all by optimizing utilities in‑context? Meet the LLM Economist ↓

Diagram of LLM Economist: left—grid of persona‑conditioned worker agents; center—planner LLM sends tax schedule; right—social‑welfare ‘hill‑climb’.

July 23, 2025 at 5:30 PM

Seth Karten

@sethkarten.ai

🚀 New preprint!
🤔 Can one agent “nudge” a synthetic civilization of Census‑grounded agents toward higher social welfare—all by optimizing utilities in‑context? Meet the LLM Economist ↓

July 23, 2025 at 5:30 PM

Seth Karten

@sethkarten.ai

🚀 Launch day! The NeurIPS 2025 PokéAgent Challenge is live. @neuripsconf.bsky.social
Two tracks:
① Showdown Battling – imperfect-info, turn-based strategy
② Pokemon Emerald Speedrunning – long horizon RPG planning
5 M labeled replays • starter kit • baselines.
Bring your LLM, RL, or hybrid agent!

Banner reading “PokéAgent Challenge @ NeurIPS 2025” with two panels: Track 1 – Competitive Pokémon Battle Bots, Track 2 – Long-Horizon RPG Gameplay. Call-to-action: “Create video-game AI! Win prizes! Live now at pokeagent.github.io.”

July 14, 2025 at 4:33 PM

Seth Karten

@sethkarten.ai

🚀 5 days until my ICML spotlight poster!
Key insights we’ll unpack:
• Base LLM + test-time planning
• Game-theoretic scaffolding
• Context-engineered opponent prediction
• Comparative LLM-as-judge (relative > absolute)

Catch me Thu Jul 17, 4:30-7 PM PT👇

July 12, 2025 at 6:12 PM

Seth Karten

@sethkarten.ai

Heading to #ICML2025 next week! If you’re into all things API (Artificial Pokémon Intelligence) from our PokéChamp spotlight to the upcoming NeurIPS PokeAgent Challenge, LLM-agent scaffolding & reasoning, or mechanism-design nudging, let’s connect. DMs open!

July 9, 2025 at 5:06 PM

Reposted by Seth Karten

Marc Lanctot

@sharky6000.bsky.social

Also the Pokemon Agent challenge by @sethkarten.ai @stephmilani.bsky.social and others!

pokeagent.github.io

June 28, 2025 at 7:31 AM

Seth Karten

@sethkarten.ai

Social media takeoff is hard. Bluesky still lacks the capability to compete with twitter

June 4, 2025 at 5:43 PM

Seth Karten

@sethkarten.ai

Excited to announce that I will be spending the summer at @Waymo on the simulation realism team! I’ll be working on learning to generate simulated worlds.
🚙🚙🚙
Send me a message if youre in the bay area and want to chat!

May 30, 2025 at 4:42 PM

Seth Karten

@sethkarten.ai

Excited to share that the PokeAgent challenge was accepted as a NeurIPS competition!

This should serve as an excellent benchmark for competitive games AND ‘speedrunning’ the RPG. I hope to see both the RL and LLM agent communities working together here to eval agents in Pokemon

More info soon👀

NeurIPS 2025 competition track submission summary, scores, and recommendation to accept

May 26, 2025 at 7:55 PM

Seth Karten

@sethkarten.ai

Scaffolding is a really bad term. It is amazing how we have gone so end-to-end that we cant imagine when the model was just one component in an architecture

May 16, 2025 at 4:54 PM

Seth Karten

@sethkarten.ai

Researchers need to stop working on low-hanging fruit. Leave that for the engineers. Your job is difficult answering questions that people will push back on

May 6, 2025 at 7:54 PM

Seth Karten

@sethkarten.ai

Wow, this is officially an ICML spotlight! See you in Vancouver :)

Seth Karten @sethkarten.ai · Mar 7

Can a Large Language Model (LLM) with zero Pokémon-specific training achieve expert-level performance in competitive Pokémon battles?
Introducing PokéChamp, our minimax LLM agent that reaches top 30%-10% human-level Elo on Pokémon Showdown!
New paper on arXiv and code on github!

May 1, 2025 at 4:48 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news