Volkan Cevher
banner
cevherlions.bsky.social
Volkan Cevher
@cevherlions.bsky.social
Associate Professor of Electrical Engineering, EPFL.
Amazon Scholar (AGI Foundations). IEEE Fellow. ELLIS Fellow.
🧑‍🍳 We provide a complete cookbook for choosing the right LMO for your architecture: 📚
- Input layers (1-hot vs image)
- Hidden layers (spectral norms)
- Output layers (flexible norm choices)
All with explicit formulas and guidance for when to use each one.
February 13, 2025 at 4:51 PM
🌟 It turns out many popular optimizers (SignSGD, Muon, etc.) are special cases of our framework - just with different norm choices.
Our unified analysis reveals deep connections between seemingly different approaches and provides new insights into why they work 🤔
February 13, 2025 at 4:51 PM
📝 Check out the preprint: arxiv.org/abs/2502.07529
Worst-case convergence analysis with rates, guarantees for learning rate transfer, and practical advice on how to properly choose norms adapted to network geometry, backed by theory 🎯
February 13, 2025 at 4:51 PM
🕵️ It’s “just” stochastic conditional gradient. The secret sauce? Don't treat your weight matrices like they're flat vectors! SCION adapts to the geometry of matrices using LMOs with respect to the correct norm: the induced operator norm.
February 13, 2025 at 4:51 PM
arxiv.org/abs/2502.07529
🚀 Key results:
- Based on conditional gradient method
- Beats Muon+Adam on NanoGPT (tested up to 3B params)
- Zero-shot learning rate transfer across model size
- Uses WAY less memory (just one set of params + half-precision grads)
- Provides explicit norm control
February 13, 2025 at 4:51 PM
🔥 Want to train large neural networks WITHOUT Adam while using less memory and getting better results? ⚡
Check out SCION: a new optimizer that adapts to the geometry of your problem using norm-constrained linear minimization oracles (LMOs): 🧵👇
February 13, 2025 at 4:51 PM