Lightnews — Scholar-powered news

@keighbee.bsky.social

24 followers 150 following 10 posts

Machine Learning Engineer @ HuggingFace

Posts Replies Media Videos

@keighbee.bsky.social

The mixture of experts model is also an option:

```
cargo run --example qwen --features metal --release -- --prompt "Write a poem about butterflies. <think></think>." --model "3-moe-a3b"
```

May 30, 2025 at 8:00 PM

@keighbee.bsky.social

We’ve got great examples of PyTorch to CoreML conversions in the Huggingface coreml-examples repo. Currently, there’s one tutorial, but more are coming soon. After converting, you can choose what compute units you want the model to run on!

GitHub - huggingface/coreml-examples: Swift Core ML Examples

Swift Core ML Examples. Contribute to huggingface/coreml-examples development by creating an account on GitHub.

github.com

December 12, 2024 at 7:02 PM

@keighbee.bsky.social

Or, My laptop has a 72 Wh battery (~208,512 J assuming only 80% is usable). Running Llama3.2-1B would drain the battery after processing:

- CPU: 674,249 tokens (~518,653 words, ~7 novels)
- GPU: 2,799,550 tokens (~2,153,500 words, ~30 novels)
- ANE: 11,273,184 tokens (~8,671,679 words, ~123 novels)

December 5, 2024 at 8:08 PM

@keighbee.bsky.social

To put it in perspective: Llama3.2-1B uses ~280 GFLOPS per 20 tokens. My laptop (~2kg) running the model would be the energy equivalent of:

- CPU (6 J): dropping it from 1 foot (31 cm)
- GPU (1.4 J): dropping it from 3 inches (7 cm)
- ANE (0.3 J): dropping it by just half an inch (1.5 cm)!

December 5, 2024 at 8:08 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news