TRL maintainer
TRL 0.14 brings *GRPO*, the RL algorithm behind 🐳 DeekSeek-R1 .
⚡ Blazing fast generation with vLLM integration.
📉 Optimized training with DeepSpeed ZeRO 1/2/3.
TRL 0.14 brings *GRPO*, the RL algorithm behind 🐳 DeekSeek-R1 .
⚡ Blazing fast generation with vLLM integration.
📉 Optimized training with DeepSpeed ZeRO 1/2/3.
It has seen impressive growth this year. Lots of new features, an improved codebase, and this has translated into increased usage. You can count on us to do even more in 2025.
It has seen impressive growth this year. Lots of new features, an improved codebase, and this has translated into increased usage. You can count on us to do even more in 2025.
Featuring a Process-supervised Reward Models (PRM) Trainer 🏋️
PRMs empower LLMs to "think before answering"—a key feature behind OpenAI's o1 launch just two weeks ago. 🚀
Featuring a Process-supervised Reward Models (PRM) Trainer 🏋️
PRMs empower LLMs to "think before answering"—a key feature behind OpenAI's o1 launch just two weeks ago. 🚀
How about doing the same next year?
How about doing the same next year?
🧑💻 Full remote
🤯 Exciting subjects
🌍 Anywhere in the world
🤸🏻 Flexible working hours
Link to apply in comment 👇
🧑💻 Full remote
🤯 Exciting subjects
🌍 Anywhere in the world
🤸🏻 Flexible working hours
Link to apply in comment 👇