jacobaustin123.bsky.social
@jacobaustin123.bsky.social
Researcher at Google DeepMind. I make LLMs go fast. I also play piano and climb sometimes. Opinions my own
Scaling an LLM involves distributing — a.k.a. "sharding" — its weights across multiple TPUs. To run it, we have to add cross-chip communication. Part 3 describes the TPU's communication primitives, and simple rules for multiplying sharded matrices: jax-ml.github.io/scaling-book... 4/n
February 4, 2025 at 6:54 PM
A big chunk of this book is dedicated to understanding the hardware that provides those system resources. We emphasize TPUs in this book, but the principles and math can be adapted to GPUs too. Part 2 explains the TPU in detail: jax-ml.github.io/scaling-book... 3/n
February 4, 2025 at 6:54 PM