You can now run inference and fine-tune locally on your Mac.
pip install -U mlx-vlm
I’m getting ~140 tok/s on M3 Max 96GB 🔥
Thanks to @pcuenq.hf.co for PR!
Model Cards 👇🏽
- Major refactoring
- Run language model only
- Image / Video feature + prompt caching
- Batch Inference
- KV quantization
- KV cache with attention sinks
- Full FT
- Lora Adapter merging
- New Models
- Major refactoring
- Run language model only
- Image / Video feature + prompt caching
- Batch Inference
- KV quantization
- KV cache with attention sinks
- Full FT
- Lora Adapter merging
- New Models