The release includes DeepSeek-R1-Zero, DeepSeek-R1 and multiple dense models, trained to improve reasoning in LLMs through RL.
DeepSeek-R1-Zero is trained purely via RL without any SFT as a preliminary step, like AlphaZero.
github.com/deepseek-ai/…
The release includes DeepSeek-R1-Zero, DeepSeek-R1 and multiple dense models, trained to improve reasoning in LLMs through RL.
DeepSeek-R1-Zero is trained purely via RL without any SFT as a preliminary step, like AlphaZero.
github.com/deepseek-ai/…