Lightnews — Scholar-powered news

Light up
your news

About Privacy Terms Help

@anatomic92.bsky.social

8 followers 150 following 0 posts

Posts Replies Media Videos

Reposted

Sung Kim

@sungkim.bsky.social

Train your own reasoning LLM using DeepSeek's GRPO algorithm with our free notebook by @unsloth.bsky.social

You'll transform Llama 3.1 (8B) to have chain-of-thought. Unsloth makes GRPO use 80% less VRAM.

Guide: docs.unsloth.ai/basics/reaso...
GitHub: github.com/unslothai/un...

Reasoning - GRPO | Unsloth Documentation

Train your own DeepSeek-R1 reasoning model with Unsloth using GRPO which is a part of Reinforcement Learning (RL) fine-tuning.

docs.unsloth.ai

February 13, 2025 at 8:01 AM