Yizhou Liu
yzliu.bsky.social
Yizhou Liu
@yzliu.bsky.social
PhD student at MIT, Physics of living systems, Complex systems, Statistical physics, Homepage: https://liuyz0.github.io/
The attraction force has the most advantage when the gradient noise is large while extra regulations (weight decay) are intermediate (5/8)
January 22, 2025 at 4:14 AM
Signum outperforms Adam when the effect of gradient noise is larger than that of loss sharpness. FOCUS further improves Signum when the loss is sharp (4/8)
January 22, 2025 at 4:14 AM
🚀 What if particles in a gas could inspire better AI training? Meet FOCUS - our new optimizer that turns physics intuition into speed! By embracing noise like attractive particles, it outperforms Adam in GPT-2 training. Slower particles → Faster learning! 🧵(1/8) arxiv.org/abs/2501.12243
January 22, 2025 at 4:14 AM