Lightnews — Scholar-powered news

Abhimanyu Hans

@ahans30.bsky.social

24 followers 77 following 2 posts

PhD Student @umdcs

https://ahans30.github.io/

Posts Replies Media Videos

Reposted by Abhimanyu Hans

tomgoldstein.bsky.social

@tomgoldstein.bsky.social

Let’s sanity check DeepSeek’s claim to train on 2048 GPUs for under 2 months, for a cost of $5.6M. It sort of checks out and sort of doesn't.

The v3 model is an MoE with 37B (out of 671B) active parameters. Let's compare to the cost of a 34B dense model. 🧵

January 29, 2025 at 5:12 PM

Abhimanyu Hans

@ahans30.bsky.social

poster sent for print 😮‍💨

are you concerned your LLM might regurgitate exact training data to your users?

join me and my co-authors at #NeurIPS2024 on wed's 1st poster session & learn how goldfish loss can help you.

eager to meet friends from past and future!

p.s. hmu if you hiring summer intern!

December 9, 2024 at 4:17 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news