Taneem
taneem-ibrahim.bsky.social
Taneem
@taneem-ibrahim.bsky.social
Tinkering with vLLM @RedHat
FP8-quantized version of Llama 4 Maverick can be downloaded from HuggingFace: huggingface.co/collections/...
Llama 4 - a meta-llama Collection
Llama 4 release
huggingface.co
April 5, 2025 at 8:22 PM
The official release by Meta includes an FP8-quantized version of Llama 4 Maverick 128E supported by Red Hat’s LLM Compressor library, enabling the 128 expert model to fit on a single NVIDIA 8xH100 node, resulting in more performance with lower costs.
April 5, 2025 at 8:20 PM