Atlas Wang
banner
atlaswang.bsky.social
Atlas Wang
@atlaswang.bsky.social
https://www.vita-group.space/ 👨‍🏫 UT Austin ML Professor (on leave)

https://www.xtxmarkets.com/ 🏦 XTX Markets Research Director (NYC AI Lab)

Superpower is trying everything 🪅

Newest focus: training next-generation super intelligence - Preview above 👶
(1/n) My favorite "optimizer" work of 2024:
📢 Introducing APOLLO! 🚀: SGD-like memory cost, yet AdamW-level performance (or better!).

❓ How much memory do we need for optimization states in LLM training ? 🧐
Almost zero.

📜 Paper: arxiv.org/abs/2412.05270
🔗 GitHub: github.com/zhuhanqing/A...
December 10, 2024 at 12:53 PM
The gradient oscillates rapidly during training of the next-generation superintelligence model (preview version).

Wanna call it Edge of Stability?
November 21, 2024 at 7:31 PM