Find out! Keynotes of the RL Conference are online:
www.youtube.com/playlist?lis...
Wanting vs liking, Agent factories, Theoretical limit of LLMs, Pluralist value, RL teachers, Knowledge flywheels
(guess who talked about which!)
Find out! Keynotes of the RL Conference are online:
www.youtube.com/playlist?lis...
Wanting vs liking, Agent factories, Theoretical limit of LLMs, Pluralist value, RL teachers, Knowledge flywheels
(guess who talked about which!)
Reach out if you’d like to chat more!
Reach out if you’d like to chat more!
assayer: A simple Python-RQ based tool to automatically monitor and evaluate ML model checkpoints offline during training.
assayer: A simple Python-RQ based tool to automatically monitor and evaluate ML model checkpoints offline during training.
job-boards.greenhouse.io/deepmind/job...
Please spread the word!
job-boards.greenhouse.io/deepmind/job...
Please spread the word!
See you in Vancouver!
arxiv.org/abs/2502.18487
🧵
See you in Vancouver!
The method we introduce in this paper is efficient because examples are chosen for their complementarity, leading to much steeper inference-time scaling! 🧪
arxiv.org/abs/2502.18487
🧵
The method we introduce in this paper is efficient because examples are chosen for their complementarity, leading to much steeper inference-time scaling! 🧪
arxiv.org/abs/2502.18487
🧵
arxiv.org/abs/2502.18487
🧵
Thoughts about this and more here:
arxiv.org/abs/2411.16905
Thoughts about this and more here:
arxiv.org/abs/2411.16905