Segmond
segmond.bsky.social
Segmond
@segmond.bsky.social
I once was here.
It's not blazing fast for me, but I never did try to optimize for speed, but 1780 t/s for 3126 then eval and 21.5t/s for 2873 token generation across 2 3090s, using llama.cpp. 976mb fp16 K and same for V with 32k context. I like that it generates long context without yapping.
April 24, 2025 at 7:27 PM
Besides yesterday in the 2nd part, I'm not reading the puzzle. I'm just asking the LLM to solve it, and it's solving it faster than I can read the puzzle. Solving them in roughly a minute, with 4090's they will probably be solved in 15-20 seconds.
December 5, 2024 at 5:57 AM