Lightnews — Scholar-powered news

Omead Pooladzandi ✈️ NeurIPS'24

@hessianfree.bsky.social

470 followers 820 following 26 posts

Optimization Generative Modeling @Caltech, PhD @UCLA. ex Research Scientist Intern @AIatMeta (opinions are my own) why is jax so difficult

Posts Replies Media Videos

Omead Pooladzandi ✈️ NeurIPS'24

@hessianfree.bsky.social

Newton-Schulz isn't the answer even for instantaneous whitening.

PSGD: MSE( Q.T Q H , I ) = 5.2e-3
Zero-Power NS 100 iterations: MSE( NS(G) , I ) = 8.2e-1
True Inverse: MSE( H^(-1/2) H H^(-1/2), I ) = 6.1e-3

PSGD whitens information significantly better than the Newton-Schulz iters found in Muon

December 27, 2024 at 9:24 AM

Omead Pooladzandi ✈️ NeurIPS'24

@hessianfree.bsky.social

Xilin is back at it again. Results are clear: damping hurts precision, but lower precision needs it if the underlying Hessian is extremely poorly conditioned.

December 7, 2024 at 4:46 PM

Omead Pooladzandi ✈️ NeurIPS'24

@hessianfree.bsky.social

PSGD tracking Muon on modded nanoGPT

December 2, 2024 at 5:30 AM

Omead Pooladzandi ✈️ NeurIPS'24

@hessianfree.bsky.social

PSGD ❤️ MARS

MARS is a new exciting variance reduction technique from @quanquangu.bsky.social 's group which can help stabilize and accelerate your deep learning pipeline. All that is needed is a gradient buffer. Here MARS speeds up the convergence of PSGD ultimately leading to a better solution.