Kyle Corbitt
corbtt.bsky.social
Kyle Corbitt
@corbtt.bsky.social
If you're fine-tuning LLMs, Gemma 3 is the new 👑 and it's not close. Gemma 3 trounces Qwen/Llama models at every size!
- Gemma 3 4B beats 7B/8B competition
- Gemma 3 27B matches 70B competition

Vision benchmarks soon!
March 21, 2025 at 4:27 PM
I hear cocaine is good but no way it can beat the rush I get from my RL-trained agent suddenly grokking a new skill.
March 18, 2025 at 1:02 AM
Training models with RL subjectively feels much more like gardening than engineering. You do your best to set the right conditions, provide the right inputs... and then wait and watch what grows. Very rewarding/magical feeling when it works!
March 17, 2025 at 5:15 PM
Big news: we've figured out how to train models 80-90% cheaper than before. Cheaper than renting your own GPUs. Cheaper than any other service. And 0 quality regression.

Super proud of the team on this one. New pricing is now live!
January 23, 2025 at 5:16 PM
This holiday season I am legitimately grateful that my kids are all <8 and not 16+. I have no idea what career prep advice I'd give someone in this moment. We're in for a ride.
December 20, 2024 at 10:46 PM
Helpful intuition that folks new to LLMs may not know: if you have a lot of data, small models are often just as good as much, much larger ones for tasks like classification and information extraction. Here I compare a 1B vs 8B on a hard classification task, and I bet you can't tell which is which!
December 13, 2024 at 10:30 PM
Btw you can view your training loss across open source models AND Gemini models on OpenPipe!
December 9, 2024 at 3:40 PM
OpenAI's Reinforcement Fine-Tuning (RFT) is far more data efficient than SFT—can generalize from 10-20 labeled examples.

Huge deal bc as compute costs drop to 0, the pain of gathering high-quality training data is the biggest barrier to deploying AI. RFT needs much less of it!
December 6, 2024 at 9:46 PM
Meta just released Llama 3.3 70B—they claim benchmarks similar to Llama 3 405B, but in a model 20% the size. It's already available as a base model on OpenPipe, and we'll release benchmarks as a fine-tuning base model soon.

huggingface.co/meta-llama/L...
December 6, 2024 at 7:02 PM
SUPER PUMPED to announce that Gemini fine-tuning is available to all OpenPipe users! Gemini Flash provides the lowest cost fine-tuning of any model in its quality class. Comparable to gpt-4o-mini, but 4x cheaper inference and FREE fine-tuning!
December 5, 2024 at 4:38 PM
SGLang is basically vLLM but better. I just tested v0.4 on a real-world task with a Llama 3.2 3B model. Reached a max throughput of 61K tokens per second—44% higher than our vLLM baseline!

lmsys.org/blog/2024-12...
SGLang v0.4: Zero-Overhead Batch Scheduler, Cache-Aware Load Balancer, Faster Structured Outputs | LMSYS Org
<p>We’re excited to release <a href="https://github.com/sgl-project/sglang">SGLang v0.4</a>, featuring significant performance improvements and new features:...
lmsys.org
December 5, 2024 at 2:35 AM
One of the new features I'm most excited about at OpenPipe is "criteria distillation". This allows you to distill an expensive LLM-as-judge criteria into a super fast, cheap, low-latency reward model that approximates the LLM-as-judge's outputs. DM for access!
December 4, 2024 at 6:43 PM
Amazon's Nova models have excellent price/perf ratio. We'd love to support them, but to deploy fine-tuned versions you need to purchase "provisioned throughput", which costs $100/hr/model. 😬 Putting out the bat signal—if you know someone at AWS Bedrock, pls put me in contact!
December 4, 2024 at 4:08 PM
Ok I am terrible at sharing product updates here, but we now support Llama 3.2 1B and 3B (the best small LLMs) as well as Qwen 2.5 72B and 32B Coder (the best open general and code-specific models) on OpenPipe!
December 4, 2024 at 12:35 AM
Kinda feels like the product engineer and IC PM roles are quickly converging. A really good AI-enabled SWE can produce the same output as a former team of 5 SWE+1 PM.

Are AI-native companies still hiring IC PMs who don't code?
November 25, 2024 at 8:03 PM
What is the current SOTA on language autoencoders? Can you run lossy compression on a 20K-word Wikipedia article to give you an archive that's just a few KB in size, but decompresses into text semantically indistinguishable from the original?
November 22, 2024 at 10:43 PM
This may become an official Qwen-stan account.
✅ Open source SOTA on code
✅ Open source SOTA in general for 14B+
✅ Almost SOTA <14B
✅ Works great for LM, RM and classification tasks
✅ SOTA open source multimodal
November 19, 2024 at 5:22 PM
OpenPipe now hosts all our docs in plaintext on our docs page at /llms.txt (index links) and /llms-full.txt (full dump of all docs).
Great idea from @jph.bsky.social!
November 18, 2024 at 8:17 PM
Qwen 2.5 Coder 32B is a 🐐
✅ Benchmarks at or above GPT-4 and Claude 3.5
✅ Subjectively feels fantastic for code (been trying it)
✅ Fine-tunable on your own data on OpenPipe!
November 13, 2024 at 11:16 PM
Last week Huggingface released "SmolLM v2," several <2B models designed for edge deployment. Interested in how they perform when fine-tuned? You're in luck! We've compared their performance with other edge models. (Spoiler: Qwen remains the champion 👑)
November 2, 2024 at 10:15 AM