Lightnews — Scholar-powered news

Seth Karten

@sethkarten.ai

Yes, please bring on the supply
We need:
- cheap energy
- cheap housing
- cheap food

Only possible by increasing supply

November 11, 2025 at 8:45 PM

Seth Karten

@sethkarten.ai

Gen 1 OU Pokemon Qualifiers end tonight and I'm not even competing, yet I'm nervously watching error bars converge.

(5/5)

October 20, 2025 at 3:50 AM

Seth Karten

@sethkarten.ai

Most LLM arenas use Bradley-Terry (batch MLE)—accurate but requires full recomputation. Glicko-1 offers the best of both worlds: online updates and convergence to the batch optimum, with uncertainty estimates included.

(4/5)

October 20, 2025 at 3:50 AM

Seth Karten

@sethkarten.ai

Top-3 agents converge across all methods (250+ games each). But ranks 4+ show systematic disagreement:
-Elo diverges from HR even when HR's error bars don't overlap
-Glicko-1 agrees with HR despite being online

(3/5)

October 20, 2025 at 3:50 AM

Seth Karten

@sethkarten.ai

In the NeurIPS PokeAgent Challenge, we stress-test 4 ranking systems across (100k+ agent matches):
- Bradley-terry (batch MLE, our ground truth)
- Elo (online, chess-standard)
- Glicko-1 (online, uncertainty-aware)
- GXE: (Glicko-derived win %)

(2/5)

Leaderboard of Pokemon Gen 1 OU Top 100 NeurIPS competition for the PokeAgent Challenge. The leaderboard shows username, elo, glicko-1, glicko-1 deviation, wins, losses, and ties for the results of the head to head battles for each agent methodology. Highlighted are top user submissions. PAC-MM-* usernames are organizer hosted baselines.

Leaderboard of Pokemon Gen 1 OU Top 100 NeurIPS competition for the PokeAgent Challenge on the pokeagent.github.io website. The leaderboard shows username, history rating, GXE, wins, losses for the results of the head to head battles for each agent methodology, including showing the currently qualifying methods.

October 20, 2025 at 3:50 AM

Seth Karten

@sethkarten.ai

A benchmark environment is nothing without data so you can pretrain before you RL.

Announcing our replay archive preview: We are releasing an additional 25k games to help you train a metagame exploiter (5 million more released after qualifier)

replays.pokeagentshowdown. com:8443/
(3/3)

October 15, 2025 at 5:50 PM

Seth Karten

@sethkarten.ai

- Gen 1 OU Battles require 100+ turns of long context planning in partially observable, stochastic environments

Check out the PokeAgent Challenge Gen 1 OU Qualifier live this week👇
youtube.com/live/N6JmD5XKf4g
(2/3)

YouTube

Share your videos with friends, family, and the world

youtube.com

October 15, 2025 at 5:50 PM

Seth Karten

@sethkarten.ai

Apparently i need to fullscreen my browser for the new post button to show up

October 15, 2025 at 5:35 PM

Seth Karten

@sethkarten.ai

We should have the highest standards for the most influential research companies

September 27, 2025 at 4:07 AM

Seth Karten

@sethkarten.ai

In the future people will play games for the mind similar to going to the gym for the body

September 25, 2025 at 9:11 PM

Seth Karten

@sethkarten.ai

If you arent paying attention, we are in a rapidly shifting period of ML paper culture. ICLR/ICML/NeurIPS are being treated as random, out of touch processes with more and more unnecessary work to submit
Most people are saying TMLR is the only good alternative, but are skeptical

September 24, 2025 at 2:31 PM

Seth Karten

@sethkarten.ai

Join our Discord for more info: discord.gg/E2DuX5FWF7

Join the PokéAgent Challenge @ NeurIPS 2025 Discord Server!

https://pokeagent.github.io/ | 362 members

discord.gg

September 2, 2025 at 1:44 PM

Seth Karten

@sethkarten.ai

We need an AI sports league and US robotics olympics

September 1, 2025 at 10:33 PM

Seth Karten

@sethkarten.ai

Does this include survey papers?

August 28, 2025 at 4:08 PM

Seth Karten

@sethkarten.ai

Eventually you could train this, but currently context switching at this general level is far too difficult. And I think the modular agentic system approach makes the most sense as this could scale to parallel running components

August 20, 2025 at 5:52 PM

Seth Karten

@sethkarten.ai

A foundation agent could be equally as modular with a harness to facilitate context switching. You could even wrap the harness in an llm harness to for routing which module to use

August 20, 2025 at 5:52 PM

Seth Karten

@sethkarten.ai

If you were deploying a robot irl, you would have
- a perception harness for visual understanding
- a planning harness for long horizon reasoning
- a control harness to make sure the actions are executed properly

August 20, 2025 at 5:52 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news