Lightnews — Scholar-powered news

Atlas Wang

@atlaswang.bsky.social

1.9K followers 3.2K following 130 posts

https://www.vita-group.space/ 👨‍🏫 UT Austin ML Professor (on leave)

https://www.xtxmarkets.com/ 🏦 XTX Markets Research Director (NYC AI Lab)

Superpower is trying everything 🪅

Newest focus: training next-generation super intelligence - Preview above 👶

Posts Replies Media Videos

Atlas Wang

@atlaswang.bsky.social

(1/n) My favorite "optimizer" work of 2024:
📢 Introducing APOLLO! 🚀: SGD-like memory cost, yet AdamW-level performance (or better!).

❓ How much memory do we need for optimization states in LLM training ? 🧐
Almost zero.

📜 Paper: arxiv.org/abs/2412.05270
🔗 GitHub: github.com/zhuhanqing/A...

December 10, 2024 at 12:53 PM

Atlas Wang

@atlaswang.bsky.social

The gradient oscillates rapidly during training of the next-generation superintelligence model (preview version).

Wanna call it Edge of Stability?

November 21, 2024 at 7:31 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news