Abhimanyu Hans
ahans30.bsky.social
Abhimanyu Hans
@ahans30.bsky.social
PhD Student @umdcs

https://ahans30.github.io/
Reposted by Abhimanyu Hans
Let’s sanity check DeepSeek’s claim to train on 2048 GPUs for under 2 months, for a cost of $5.6M. It sort of checks out and sort of doesn't.

The v3 model is an MoE with 37B (out of 671B) active parameters. Let's compare to the cost of a 34B dense model. 🧵
January 29, 2025 at 5:12 PM
poster sent for print 😮‍💨

are you concerned your LLM might regurgitate exact training data to your users?

join me and my co-authors at #NeurIPS2024 on wed's 1st poster session & learn how goldfish loss can help you.

eager to meet friends from past and future!

p.s. hmu if you hiring summer intern!
December 9, 2024 at 4:17 AM