Abstract representations+reinforcement learning.
The 200 ELO points difference between recent models and a model that is 2 years old means that a human rater has ~75% chance of preferring an answer from a recent model.
Based on available data, all indicators about the progress of AI (in particular LLMs) remain strong.
The 200 ELO points difference between recent models and a model that is 2 years old means that a human rater has ~75% chance of preferring an answer from a recent model.
Based on available data, all indicators about the progress of AI (in particular LLMs) remain strong.
Paper: arxiv.org/abs/2505.15345
Codebase: github.com/Jacobkooi/Ha...
Paper: arxiv.org/abs/2505.15345
Codebase: github.com/Jacobkooi/Ha...
1. Convolutional Hadamard Representations.
2. Max-pooling instead of convolutional down-sampling.
3. Gaussian Error Linear Unit activations.
1. Convolutional Hadamard Representations.
2. Max-pooling instead of convolutional down-sampling.
3. Gaussian Error Linear Unit activations.
Our Hadamax (Hadamard max-pooling) encoder architecture improves the recent PQN algorithm’s Atari performance by 80%, allowing it to significantly surpass Rainbow-DQN!
Our Hadamax (Hadamard max-pooling) encoder architecture improves the recent PQN algorithm’s Atari performance by 80%, allowing it to significantly surpass Rainbow-DQN!