Lightnews — Scholar-powered news

Leandro von Werra

@lvwerra.bsky.social

Distributed training is notoriously hard to learn - knowledge is scattered across papers and complex codebases.

Enter picotron: implementing all 4D parallelism concepts in separate, readable files totaling just 1988 LoC!

January 6, 2025 at 4:51 PM

Reposted by Leandro von Werra

merve

@merve.bsky.social

supercharge your LLM apps with smolagents 🔥

however cool your LLM is, without being agentic it can only go so far

enter smolagents: a new agent library by @hf.co to make the LLM write code, do analysis and automate boring stuff! huggingface.co/blog/smolage...

thumbnail that says introducing smolagents

December 31, 2024 at 3:32 PM

Reposted by Leandro von Werra

Anton

@anton-l.bsky.social

Introducing 📐FineMath: the best open math pre-training dataset with 50B+ tokens!

Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH.

🤗 huggingface.co/datasets/Hug...

Here’s a breakdown 🧵

A plot showing increased performance of Llama-3.2-3B when pretrained on FineMath

December 19, 2024 at 3:55 PM

Leandro von Werra

@lvwerra.bsky.social

Releasing Jupyter Agents - LLMs running data analysis directly in a notebook!

The agent can load data, execute code, plot results and following your guidance and ideas!

A very natural way to collaborate with an LLM over data and it's just scratching the surface of what's possible soon!

December 19, 2024 at 6:56 PM

Reposted by Leandro von Werra

Lewis Tunstall

@lewtun.bsky.social

We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute 🔥

How? By combining step-wise reward models with tree search algorithms :)

We're open sourcing the full recipe and sharing a detailed blog post 👇

December 16, 2024 at 5:08 PM

Reposted by Leandro von Werra

Entalpic

@entalpic.bsky.social

Big News in AI4Science! ✨
We are thrilled to launch LeMaterial, an open-source project in collaboration with @hf.co to accelerate materials discovery ⚛️🤗

Discover LeMat-Bulk: a 6.7M-entry dataset standardizing and unifying Materials Project, Alexandria and OQMD

December 11, 2024 at 6:34 PM

Reposted by Leandro von Werra

Guilherme Penedo

@guilherme.hf.co

Announcing 🥂 FineWeb2: A sparkling update with 1000s of 🗣️languages.

We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.

🥂 FineWeb2 has 8TB of compressed text data and outperforms other datasets.

December 8, 2024 at 9:19 AM

Reposted by Leandro von Werra

Thomas Wolf

@thomwolf.bsky.social

The FineWeb team is happy to finally release "FineWeb2" 🥂🥳

FineWeb 2 extends the data driven approach to pre-training dataset design that was introduced in FineWeb 1 to now covers 1893 languages/scripts

Details: huggingface.co/datasets/Hug...

A detailed open-science tech report is coming soon

December 8, 2024 at 9:08 AM

Leandro von Werra

@lvwerra.bsky.social

There are not many opportunities out there to build open LLMs and make them state-of-the-art, too! This is one of them.

Elie @eliebak.hf.co · Nov 27

We’re looking for an intern to join our SmolLM team! If you’re excited about training LLMs and building high-quality datasets, we’d love to hear from you. 🤗

US: apply.workable.com/huggingface/...
EMEA: apply.workable.com/huggingface/...

ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote - Hugging Face

Here at Hugging Face, we’re on a journey to advance good Machine Learning and make it more accessible. Along the way, we contribute to the development of technology for the better.We have built the fa...

apply.workable.com

November 28, 2024 at 9:42 AM

Reposted by Leandro von Werra

Xenova

@xenova.bsky.social

WOW! 🤯 Language models are becoming smaller and more capable than ever! Here's SmolLM2 running 100% locally in-browser w/ WebGPU on a 6-year-old GPU. Just look at that speed! ⚡️😍

Powered by 🤗 Transformers.js and ONNX Runtime Web!

How many tokens/second do you get? Let me know! 👇

November 27, 2024 at 1:51 PM

Leandro von Werra

@lvwerra.bsky.social

Some people are pushing models to the top right of the plot following the scaling laws, others push them to the top left and make them faster and cheaper!

We need both!

Andi @andimara.bsky.social · Nov 26

Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.

SmolVLM can be fine-tuned on a Google collab and be run on a laptop! Or process millions of documents with a consumer GPU!

November 26, 2024 at 4:33 PM

Reposted by Leandro von Werra

Anton

@anton-l.bsky.social

Check out how easy it is to do LLM evals with LightEval!

* any dataset on the 🤗 Hub can become an eval task in a few lines of code: customize the prompt, metrics, parsing, few-shots, everything!
* model- and data-parallel inference
* auto batching with the new vLLM backend

A screenshot of LightEval benchmarking results in a terminal

November 25, 2024 at 5:24 PM

Reposted by Leandro von Werra

Thomas Wolf

@thomwolf.bsky.social

It's Sunday morning so taking a minute for a nerdy thread (on math, tokenizers and LLMs) of the work of our intern Garreth

By adding a few lines of code to the base Llama 3 tokenizer, he got a free boost in arithmetic performance 😮

[thread]

November 24, 2024 at 11:05 AM

Leandro von Werra

@lvwerra.bsky.social

What's the secret sauce of SmolLM2 to beat LLM titans like Llama3.2 and Qwen2.5?

Unsurprisingly: data, data, data!

The SmolTalk is open and available here: huggingface.co/datasets/Hug...

November 21, 2024 at 2:17 PM

Leandro von Werra

@lvwerra.bsky.social

All the things you need to know to pretrain an LLM at home*!

Gave a workshop at Uni Bern: starts with scaling laws and goes to web scale data processing and finishes training with 4D parallelism and ZeRO.

*assuming your home includes an H100 cluster

November 19, 2024 at 8:37 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news