Lightnews — Scholar-powered news

Leandro von Werra

@lvwerra.bsky.social

1.2K followers 51 following 11 posts

Research @ Hugging Face

Posts Replies Media Videos

Leandro von Werra

@lvwerra.bsky.social

Distributed training is notoriously hard to learn - knowledge is scattered across papers and complex codebases.

Enter picotron: implementing all 4D parallelism concepts in separate, readable files totaling just 1988 LoC!

January 6, 2025 at 4:51 PM

Leandro von Werra

@lvwerra.bsky.social

Or watch how the model solves the Lokta-Volterra equation and plots the results and refines them.

Try it out: huggingface.co/spaces/data-...

December 19, 2024 at 6:56 PM

Leandro von Werra

@lvwerra.bsky.social

Releasing Jupyter Agents - LLMs running data analysis directly in a notebook!

The agent can load data, execute code, plot results and following your guidance and ideas!

A very natural way to collaborate with an LLM over data and it's just scratching the surface of what's possible soon!

December 19, 2024 at 6:56 PM

Leandro von Werra

@lvwerra.bsky.social

What's the secret sauce of SmolLM2 to beat LLM titans like Llama3.2 and Qwen2.5?

Unsurprisingly: data, data, data!

The SmolTalk is open and available here: huggingface.co/datasets/Hug...

November 21, 2024 at 2:17 PM

Leandro von Werra

@lvwerra.bsky.social

All the things you need to know to pretrain an LLM at home*!

Gave a workshop at Uni Bern: starts with scaling laws and goes to web scale data processing and finishes training with 4D parallelism and ZeRO.

*assuming your home includes an H100 cluster

November 19, 2024 at 8:37 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news