kb
keighbee.bsky.social
kb
@keighbee.bsky.social
Machine Learning Engineer @ HuggingFace
The mixture of experts model is also an option:

```
cargo run --example qwen --features metal --release -- --prompt "Write a poem about butterflies. <think></think>." --model "3-moe-a3b"
```
May 30, 2025 at 8:00 PM
We’ve got great examples of PyTorch to CoreML conversions in the Huggingface coreml-examples repo. Currently, there’s one tutorial, but more are coming soon. After converting, you can choose what compute units you want the model to run on!
GitHub - huggingface/coreml-examples: Swift Core ML Examples
Swift Core ML Examples. Contribute to huggingface/coreml-examples development by creating an account on GitHub.
github.com
December 12, 2024 at 7:02 PM
Or, My laptop has a 72 Wh battery (~208,512 J assuming only 80% is usable). Running Llama3.2-1B would drain the battery after processing:

- CPU: 674,249 tokens (~518,653 words, ~7 novels)
- GPU: 2,799,550 tokens (~2,153,500 words, ~30 novels)
- ANE: 11,273,184 tokens (~8,671,679 words, ~123 novels)
December 5, 2024 at 8:08 PM
To put it in perspective: Llama3.2-1B uses ~280 GFLOPS per 20 tokens. My laptop (~2kg) running the model would be the energy equivalent of:

- CPU (6 J): dropping it from 1 foot (31 cm)
- GPU (1.4 J): dropping it from 3 inches (7 cm)
- ANE (0.3 J): dropping it by just half an inch (1.5 cm)!
December 5, 2024 at 8:08 PM