Lightnews — Scholar-powered news

$refractai.bsky.social$

@refractai.bsky.social

10 followers 8 following 3 posts

Posts Replies Media Videos

$refractai.bsky.social$

refractai.bsky.social

@refractai.bsky.social

Yes for prompt processing, based on github.com/ggml-org/lla... it scales near linearly with GPU core count (FLOPS).

Performance of llama.cpp on Apple Silicon M-series · ggml-org llama.cpp · Discussion #4167

Summary LLaMA 7B BW [GB/s] GPU Cores F16 PP [t/s] F16 TG [t/s] Q8_0 PP [t/s] Q8_0 TG [t/s] Q4_0 PP [t/s] Q4_0 TG [t/s] ✅ M1 1 68 7 108.21 7.92 107.81 14.19 ✅ M1 1 68 8 117.25 7.91 117.96 14.15 ✅ M1...

github.com

March 10, 2025 at 12:10 AM

$refractai.bsky.social$

refractai.bsky.social

@refractai.bsky.social

Prompt processing is compute bound (raw FLOPS) instead of memory bandwidth bound like token generation. M2 Max is 13TFLOPS. Nvidia 3090 is 35TFLOPS. It's just the Mac's GPU being small.

March 9, 2025 at 11:00 PM

$refractai.bsky.social$

refractai.bsky.social

@refractai.bsky.social

JetFormer isn't on there yet right?

December 3, 2024 at 1:47 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news