Yizhou Liu
yzliu.bsky.social
Yizhou Liu
@yzliu.bsky.social
PhD student at MIT, Physics of living systems, Complex systems, Statistical physics, Homepage: https://liuyz0.github.io/
FOCUS is faster than the Adam baseline from the literature!! (8/8)
January 22, 2025 at 4:14 AM
FOCUS appears to be much more stable than Signum and Adam on our machines (7/8)
January 22, 2025 at 4:14 AM
Predictions from synthetic losses are relevant to reality! With small batch sizes (large noises), Signum outperforms Adam in training MLP for MNIST classification (6/8)
January 22, 2025 at 4:14 AM
The attraction force has the most advantage when the gradient noise is large while extra regulations (weight decay) are intermediate (5/8)
January 22, 2025 at 4:14 AM
Signum outperforms Adam when the effect of gradient noise is larger than that of loss sharpness. FOCUS further improves Signum when the loss is sharp (4/8)
January 22, 2025 at 4:14 AM
Our picture of the loss landscape is a narrowing valley (3/8)
January 22, 2025 at 4:14 AM
We add an attraction force (highlighted in red) to Signum (SignGD) (2/8)
January 22, 2025 at 4:14 AM
🚀 What if particles in a gas could inspire better AI training? Meet FOCUS - our new optimizer that turns physics intuition into speed! By embracing noise like attractive particles, it outperforms Adam in GPT-2 training. Slower particles → Faster learning! 🧵(1/8) arxiv.org/abs/2501.12243
January 22, 2025 at 4:14 AM
Predictions from synthetic losses are relevant to reality! With small batch sizes (large noises), Signum outperforms Adam in training MLP for MNIST classification (6/8)
January 22, 2025 at 4:00 AM
The attraction force has the most advantage when the gradient noise is large while extra regulations (weight decay) are intermediate (5/8)
January 22, 2025 at 4:00 AM
Signum outperforms Adam when the effect of gradient noise is larger than that of loss sharpness. FOCUS further improves Signum when the loss is sharp (4/8)
January 22, 2025 at 4:00 AM
Our picture of the loss landscape is a narrowing valley (3/8)
January 22, 2025 at 4:00 AM
We add an attraction force (highlighted in red) to Signum (SignGD) (2/8)
January 22, 2025 at 4:00 AM