Lightnews — Scholar-powered news

@jacobaustin123.bsky.social

440 followers 38 following 12 posts

Researcher at Google DeepMind. I make LLMs go fast. I also play piano and climb sometimes. Opinions my own

Posts Replies Media Videos

jacobaustin123.bsky.social

@jacobaustin123.bsky.social

Scaling an LLM involves distributing — a.k.a. "sharding" — its weights across multiple TPUs. To run it, we have to add cross-chip communication. Part 3 describes the TPU's communication primitives, and simple rules for multiplying sharded matrices: jax-ml.github.io/scaling-book... 4/n

February 4, 2025 at 6:54 PM

jacobaustin123.bsky.social

@jacobaustin123.bsky.social

A big chunk of this book is dedicated to understanding the hardware that provides those system resources. We emphasize TPUs in this book, but the principles and math can be adapted to GPUs too. Part 2 explains the TPU in detail: jax-ml.github.io/scaling-book... 3/n

February 4, 2025 at 6:54 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news