Brad Larson
banner
bradlarson.bsky.social
Brad Larson
@bradlarson.bsky.social
Exploring heterogeneous computation at Modular. Out here in the Wisconsin woods with two pugs and @redqueencoder.bsky.social
But under the hood, we've built a generalized framework for programming accelerators, from a computational graph API in Python, to our multi-device kernels written in Mojo. It's worth noting that we use no CUDA libraries, and yet hitting the state-of-the-art on NVIDIA GPUs. AMD is coming soon.
December 17, 2024 at 6:53 PM
We chose end-to-end serving of a large language model on NVIDIA A100 GPUs as our "steel thread" use case to prove out the core technology, it was pretty much the highest bar we could set for GPU performance: www.modular.com/blog/max-gpu...
Modular: MAX GPU: State of the Art Throughput on a New GenAI platform
Measuring state of the art GPU performance compared to vLLM on Modular's MAX 24.6
www.modular.com
December 17, 2024 at 6:48 PM