Lightnews — Scholar-powered news

Philipp Schmid

@philschmid.bsky.social

Code and methods open source in a new library ,“learn and search”
Blog: huggingface.co/spaces/Huggi...

Learn and Search Repo: github.com/huggingface/...

Scaling test-time compute - a Hugging Face Space by HuggingFaceH4

Discover amazing ML apps made by the community

huggingface.co

December 17, 2024 at 7:30 AM

Philipp Schmid

@philschmid.bsky.social

- Introduce DVTS, a new method of performance on larger compute budgets by maintaining solution diversity
- Using compute-optimal scaling, a Llama 3 3B outperforms 70B (22x larger) on mathematical reasoning tasks

December 17, 2024 at 7:30 AM

Philipp Schmid

@philschmid.bsky.social

- Process Reward Models (PRMs) played a crucial role in the search process by evaluating intermediate solution steps
- Different search strategies work better for different problem difficulties - beam search for harder problems, Best-of-N for simpler ones

December 17, 2024 at 7:30 AM

Philipp Schmid

@philschmid.bsky.social

- Test-time compute scaling offers an alternative to training larger models by allowing smaller models to "think longer"
- Explored Best-of-N sampling, beam search, and Diverse Verifier Tree Search (DVTS)
- Llama 3 1B achieved 55% accuracy on the MATH benchmark using optimal search strategies

December 17, 2024 at 7:30 AM

Philipp Schmid

@philschmid.bsky.social

By scaling test-time compute, smaller models can match or even surpass the performance of larger models. Llama 3.2 3B can outperform Llama 3.1 70B on MATH-500!🤯

December 17, 2024 at 7:30 AM

Philipp Schmid

@philschmid.bsky.social

- 🛠️ Cuts down costs to ~2.29% and time to ~2.36% of human evaluation
- 💰 Costs $30 vs $1,297 for human evaluation
- ⚡ Reduced time to 118.43 minutes vs 86.5 hours
- 🧑‍⚖️ LLM achieved a 60-70% alignment rate to humans
- 🥇 Agent achieved a 90% alignment rate to humans

huggingface.co/datasets/DEV...

DEVAI-benchmark/DEVAI · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

December 10, 2024 at 9:53 AM

Philipp Schmid

@philschmid.bsky.social

The Agent-as-a-Judge is a graph-based agent with tools to locate, read, retrieve, and evaluate files and information for a code project to evaluate the results of other agents by comparing its judgments to human evaluations (alignment rate, judge shift).

Github: github.com/metauto-ai/a...

December 10, 2024 at 9:53 AM

Philipp Schmid

@philschmid.bsky.social

Sora UI: sora.com

Kudos to OpenAI for shipping this! The UI/UX looks really thorough! 🚢

Sora

Transform text and images into immersive videos. Animate stories, visualize ideas, and bring your concepts to life.

sora.com

December 9, 2024 at 6:41 PM

Philipp Schmid

@philschmid.bsky.social

OpenAI trained a new Turbo model to make it easier and faster to use. With "storyboards" users get a CapCut/Tiktok/Reel-like text-to-video editor, that can be used to edit and create new short-form content! Social media will be flooded.🌊

December 9, 2024 at 6:41 PM

Philipp Schmid

@philschmid.bsky.social

Blog: qwenlm.github.io/blog/qwq-32b...
Model: huggingface.co/Qwen/QwQ-32B...
Demo: huggingface.co/spaces/Qwen/...

QwQ: Reflect Deeply on the Boundaries of the Unknown

GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD Note: This is the pronunciation of QwQ: /kwju:/ , similar to the word “quill”. What does it mean to think, to question, to understand? These are the deep wa...

qwenlm.github.io

November 28, 2024 at 8:01 AM

Philipp Schmid

@philschmid.bsky.social

- ⚠️ notable limitations including language mixing, recursive reasoning loops, and safety considerations
- 😍 Released under Apache 2.0 on Hugging Face
- 👀 Full “reasoning” (CoT) available in the demo

November 28, 2024 at 8:01 AM

Philipp Schmid

@philschmid.bsky.social

- 👨‍🔬 QwQ-32B-Preview is an experimental research
- 🔧 32.5B parameters and 32,768 context length
- 📊 65.2% on GPQA, 50.0% on AIME, 90.6% on MATH-500, and 50.0% on LiveCodeBench

November 28, 2024 at 8:01 AM

Philipp Schmid

@philschmid.bsky.social

Models: huggingface.co/HuggingFaceT...
Blog: huggingface.co/blog/smolvlm

HuggingFaceTB/SmolVLM-Instruct · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

November 26, 2024 at 4:31 PM

Philipp Schmid

@philschmid.bsky.social

🎥 Surprising video capabilities with 27.14% on CinePile
🔓 Released under Apache 2.0 on @huggingface.bsky.social
📱 Can run efficiently on laptops and edge devices

November 26, 2024 at 4:31 PM

Philipp Schmid

@philschmid.bsky.social

🚀 Smallest SOTA vision language model at only 2B parameters
🛠️ Released 3 variants with Base, Synthetic, and Instruct
💾 Requires only 5GB GPU RAM and achieves 38.8% on MMMU, 81.6% on DocVQA
⚡ 3.3-4.5x faster prefill and 7.5-16x faster generation vs Qwen2-VL

November 26, 2024 at 4:31 PM

Philipp Schmid

@philschmid.bsky.social

Blog: neuralmagic.com/blog/24-spar...
Pruning is not a new technique, but it was much harder to achieve good results and maintain performance across tasks compared to quantization. Let's see if Neural Magic can change that.

2:4 Sparse Llama: Smaller Models for Efficient GPU Inference

Discover Sparse Llama: A 50% pruned, GPU-optimized Llama 3.1 model with 2:4 sparsity, enabling faster, cost-effective inference without sacrificing accuracy.

neuralmagic.com

November 26, 2024 at 8:24 AM

Philipp Schmid

@philschmid.bsky.social

- 📈 Full recovery on fine-tuning tasks (GSM8K, Evol-CodeAlpaca, Ultrachat-200K)
- ⚡ 1.4-2.1x better multi-query throughput
- 🌱 Pruned using 13B tokens training, 26 hours on 32 H100s
- 🔧 Optimized for NVIDIA Ampere GPUs and newer

November 26, 2024 at 8:24 AM

Philipp Schmid

@philschmid.bsky.social

- 🔄 98.4% original accuracy on on Open LLM Leaderboard v1 with 50% less parameters using 2:4 sparsity pattern
- 🚀 30% higher throughput and 1.8x lower latency with up to 5.0x when combined with quantization
- 💻 Works with 4-bit quantization (GPTQ) and Sparse-Marlin kernels

November 26, 2024 at 8:24 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news