anatomic92.bsky.social
@anatomic92.bsky.social
Reposted
Train your own reasoning LLM using DeepSeek's GRPO algorithm with our free notebook by @unsloth.bsky.social

You'll transform Llama 3.1 (8B) to have chain-of-thought. Unsloth makes GRPO use 80% less VRAM.

Guide: docs.unsloth.ai/basics/reaso...
GitHub: github.com/unslothai/un...
Reasoning - GRPO | Unsloth Documentation
Train your own DeepSeek-R1 reasoning model with Unsloth using GRPO which is a part of Reinforcement Learning (RL) fine-tuning.
docs.unsloth.ai
February 13, 2025 at 8:01 AM