Lightnews — Scholar-powered news

Brad Larson

@bradlarson.bsky.social

Exploring heterogeneous computation at Modular. Out here in the Wisconsin woods with two pugs and @redqueencoder.bsky.social

Posts Replies Media Videos

Brad Larson

@bradlarson.bsky.social

But under the hood, we've built a generalized framework for programming accelerators, from a computational graph API in Python, to our multi-device kernels written in Mojo. It's worth noting that we use no CUDA libraries, and yet hitting the state-of-the-art on NVIDIA GPUs. AMD is coming soon.

December 17, 2024 at 6:53 PM

Brad Larson

@bradlarson.bsky.social

We chose end-to-end serving of a large language model on NVIDIA A100 GPUs as our "steel thread" use case to prove out the core technology, it was pretty much the highest bar we could set for GPU performance: www.modular.com/blog/max-gpu...

Modular: MAX GPU: State of the Art Throughput on a New GenAI platform

Measuring state of the art GPU performance compared to vLLM on Modular's MAX 24.6

www.modular.com

December 17, 2024 at 6:48 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news