Code Generation
> HumanEval: 80.5% → 88.4% (+7.9%)
> MBPP EvalPlus: 86.0% → 87.6% (+1.6%)
Steerability
> IFEval: 87.5% → 92.1% (+4.6%)
Reasoning & Math
> GPQA Diamond (CoT): 48.0% → 50.5% (+2.5%)
> MATH (CoT): 68.0% → 77.0% (+9%)
Code Generation
> HumanEval: 80.5% → 88.4% (+7.9%)
> MBPP EvalPlus: 86.0% → 87.6% (+1.6%)
Steerability
> IFEval: 87.5% → 92.1% (+4.6%)
Reasoning & Math
> GPQA Diamond (CoT): 48.0% → 50.5% (+2.5%)
> MATH (CoT): 68.0% → 77.0% (+9%)
> GPQA Diamond (CoT): 50.5% vs 49.0%
> Math (CoT): 77.0% vs 73.8%
> Steerability (IFEval): 92.1% vs 88.6%
huggingface.co/meta-llama/L...
> GPQA Diamond (CoT): 50.5% vs 49.0%
> Math (CoT): 77.0% vs 73.8%
> Steerability (IFEval): 92.1% vs 88.6%
huggingface.co/meta-llama/L...
That's it, the code is super readable, try it out today! 🤗
huggingface.co/spaces/reach...
That's it, the code is super readable, try it out today! 🤗
huggingface.co/spaces/reach...
> 2-stage pre-training and 3-phase post-training, including a trapezoid learning rate schedule
try it out on hugging face today! 🤗
huggingface.co/collections/...
> 2-stage pre-training and 3-phase post-training, including a trapezoid learning rate schedule
try it out on hugging face today! 🤗
huggingface.co/collections/...
huggingface.co/OuteAI/OuteT...
huggingface.co/OuteAI/OuteT...