Evan Walters
banner
evanatyourservice.bsky.social
Evan Walters
@evanatyourservice.bsky.social
ML/RL enthusiast, second-order optimization, plasticity, environmentalist
Many second-order optimizers aim to whiten the gradient, which scales each direction in the gradient to unit length. But why is this useful?
December 11, 2024 at 6:47 PM
In a world of tuning I wanted to see how PSGD kron would fair without any tuning whatsoever on some Atari RL. I plugged it into CleanRL PPO with defaults and same LR as adam and it did quite well, check out some graphs! W&B report: api.wandb.ai/links/evanat...
December 11, 2024 at 4:19 PM
I implemented mLSTM from the xLSTM paper from @HochreiterSepp and team in JAX, it can more or less be used in place of attention. Haven't done a lot of experiments with it yet, if you give it a try please report back!

github.com/evanatyourse...
GitHub - evanatyourservice/xLSTM-JAX: An implementation of mLSTM from xLSTM in JAX
An implementation of mLSTM from xLSTM in JAX. Contribute to evanatyourservice/xLSTM-JAX development by creating an account on GitHub.
github.com
November 30, 2024 at 4:43 PM
Reposted by Evan Walters
Hi @clementpoiret.bsky.social I am one of the co-authors of PSGD from 2022, and actively working on PSGD Kron with Xilin and @evanatyourservice.bsky.social glad you are excited about PSGD Kron!
PSGD ❤️ MARS

MARS is a new exciting variance reduction technique from @quanquangu.bsky.social 's group which can help stabilize and accelerate your deep learning pipeline. All that is needed is a gradient buffer. Here MARS speeds up the convergence of PSGD ultimately leading to a better solution.
November 28, 2024 at 2:16 AM
Reposted by Evan Walters
Just put together a starter pack for Deep Learning Theory. Let me know if you'd like to be included or suggest someone to add to the list!

go.bsky.app/2qnppia
November 22, 2024 at 9:35 PM
Reposted by Evan Walters
PSGD ❤️ MARS

MARS is a new exciting variance reduction technique from @quanquangu.bsky.social 's group which can help stabilize and accelerate your deep learning pipeline. All that is needed is a gradient buffer. Here MARS speeds up the convergence of PSGD ultimately leading to a better solution.
November 26, 2024 at 4:21 AM