Blog: huggingface.co/spaces/Huggi...
Learn and Search Repo: github.com/huggingface/...
Blog: huggingface.co/spaces/Huggi...
Learn and Search Repo: github.com/huggingface/...
- Using compute-optimal scaling, a Llama 3 3B outperforms 70B (22x larger) on mathematical reasoning tasks
- Using compute-optimal scaling, a Llama 3 3B outperforms 70B (22x larger) on mathematical reasoning tasks
- Different search strategies work better for different problem difficulties - beam search for harder problems, Best-of-N for simpler ones
- Different search strategies work better for different problem difficulties - beam search for harder problems, Best-of-N for simpler ones
- Explored Best-of-N sampling, beam search, and Diverse Verifier Tree Search (DVTS)
- Llama 3 1B achieved 55% accuracy on the MATH benchmark using optimal search strategies
- Explored Best-of-N sampling, beam search, and Diverse Verifier Tree Search (DVTS)
- Llama 3 1B achieved 55% accuracy on the MATH benchmark using optimal search strategies
- 💰 Costs $30 vs $1,297 for human evaluation
- ⚡ Reduced time to 118.43 minutes vs 86.5 hours
- 🧑⚖️ LLM achieved a 60-70% alignment rate to humans
- 🥇 Agent achieved a 90% alignment rate to humans
huggingface.co/datasets/DEV...
- 💰 Costs $30 vs $1,297 for human evaluation
- ⚡ Reduced time to 118.43 minutes vs 86.5 hours
- 🧑⚖️ LLM achieved a 60-70% alignment rate to humans
- 🥇 Agent achieved a 90% alignment rate to humans
huggingface.co/datasets/DEV...
Github: github.com/metauto-ai/a...
Github: github.com/metauto-ai/a...
OpenAI trained a new Turbo model to make it easier and faster to use. With "storyboards" users get a CapCut/Tiktok/Reel-like text-to-video editor, that can be used to edit and create new short-form content! Social media will be flooded.🌊
OpenAI trained a new Turbo model to make it easier and faster to use. With "storyboards" users get a CapCut/Tiktok/Reel-like text-to-video editor, that can be used to edit and create new short-form content! Social media will be flooded.🌊
Model: huggingface.co/Qwen/QwQ-32B...
Demo: huggingface.co/spaces/Qwen/...
- 😍 Released under Apache 2.0 on Hugging Face
- 👀 Full “reasoning” (CoT) available in the demo
- 😍 Released under Apache 2.0 on Hugging Face
- 👀 Full “reasoning” (CoT) available in the demo
- 🔧 32.5B parameters and 32,768 context length
- 📊 65.2% on GPQA, 50.0% on AIME, 90.6% on MATH-500, and 50.0% on LiveCodeBench
- 🔧 32.5B parameters and 32,768 context length
- 📊 65.2% on GPQA, 50.0% on AIME, 90.6% on MATH-500, and 50.0% on LiveCodeBench
🔓 Released under Apache 2.0 on @huggingface.bsky.social
📱 Can run efficiently on laptops and edge devices
🔓 Released under Apache 2.0 on @huggingface.bsky.social
📱 Can run efficiently on laptops and edge devices
🛠️ Released 3 variants with Base, Synthetic, and Instruct
💾 Requires only 5GB GPU RAM and achieves 38.8% on MMMU, 81.6% on DocVQA
⚡ 3.3-4.5x faster prefill and 7.5-16x faster generation vs Qwen2-VL
🛠️ Released 3 variants with Base, Synthetic, and Instruct
💾 Requires only 5GB GPU RAM and achieves 38.8% on MMMU, 81.6% on DocVQA
⚡ 3.3-4.5x faster prefill and 7.5-16x faster generation vs Qwen2-VL
Pruning is not a new technique, but it was much harder to achieve good results and maintain performance across tasks compared to quantization. Let's see if Neural Magic can change that.
Pruning is not a new technique, but it was much harder to achieve good results and maintain performance across tasks compared to quantization. Let's see if Neural Magic can change that.
- ⚡ 1.4-2.1x better multi-query throughput
- 🌱 Pruned using 13B tokens training, 26 hours on 32 H100s
- 🔧 Optimized for NVIDIA Ampere GPUs and newer
- ⚡ 1.4-2.1x better multi-query throughput
- 🌱 Pruned using 13B tokens training, 26 hours on 32 H100s
- 🔧 Optimized for NVIDIA Ampere GPUs and newer
- 🚀 30% higher throughput and 1.8x lower latency with up to 5.0x when combined with quantization
- 💻 Works with 4-bit quantization (GPTQ) and Sparse-Marlin kernels
- 🚀 30% higher throughput and 1.8x lower latency with up to 5.0x when combined with quantization
- 💻 Works with 4-bit quantization (GPTQ) and Sparse-Marlin kernels