Oren Neumann
orenneumann.bsky.social
Oren Neumann
@orenneumann.bsky.social
Doing RL on autonomous driving, supply chains and board games. Physics PhD from Goethe Uni Frankfurt.
There is: in those games, larger models improve overall accuracy by focusing on late-game positions, forgetting what they learned on opening positions. This directly harms performance, since mastering openings is crucial, while wrapping up a game can be done with blind MCTS.
December 19, 2024 at 2:17 PM
AlphaZero doesn't always scale nicely. On some games, Elo goes up, then sharply degrades w/ model size. We noticed this happens in games where game rules bend the Zipf curve, since end-game board positions have a high frequency. Is there a connection?
December 19, 2024 at 2:17 PM
In line with the quantization model, we see that AlphaZero agents fit board states in decreasing order of frequency. This is very surprising: high-frequency opening moves are exponentially harder to model, since they depend on downstream positions.
December 19, 2024 at 2:17 PM
There is! Chess/Go tournament games famously follow Zipf's law: the frequency of each board position scales as a power of their rank.
We find that Zipf's law emerges also in RL self-play games. It's a direct result of universal board-game rules.
December 19, 2024 at 2:17 PM
🚨Do RL scaling laws share the same origin as LLM scaling laws?
We show that AlphaZero scaling might be the result of Zipf's law, and that inverse-scaling can result from unusual frequency curves.
arxiv.org/abs/2412.11979
A 🧵 on scaling laws and board games! ♟️🎲
December 19, 2024 at 2:17 PM