Lightnews — Scholar-powered news

Unsloth AI

@unsloth.ai

380 followers 6 following 19 posts

Open source LLM fine-tuning! 🦥
Github: http://github.com/unslothai/unsloth Discord: https://discord.gg/unsloth

Posts Replies Media Videos

Unsloth AI

@unsloth.ai

The 1.58-bit quant fits in 131GB VRAM (2× H100s) for fast throughput inference at ~140 tokens/s.

For best results, use 2.51-bit Dynamic quant & at least 160GB+ combined VRAM + RAM.

Basic 1-bit & 2-bit quantization causes the model to produce repetition and poor code. Our dynamic quants solve this.

March 25, 2025 at 11:54 PM

Unsloth AI

@unsloth.ai

For our benchmarks, a standard GRPO QLoRA setup (TRL + FA2) for Llama 3.1 (8B) at 20K context required 510.8GB VRAM. Unsloth’s GRPO algorithms reduces this to just 54.3GB.

The 5GB VRAM requirement for Qwen2.5 (1.5B) is down from 7GB in our previous GRPO release two weeks ago!

February 20, 2025 at 6:41 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news