Lightnews — Scholar-powered news

Evan Walters

@evanatyourservice.bsky.social

Many second-order optimizers aim to whiten the gradient, which scales each direction in the gradient to unit length. But why is this useful?

December 11, 2024 at 6:47 PM

Evan Walters

@evanatyourservice.bsky.social

In a world of tuning I wanted to see how PSGD kron would fair without any tuning whatsoever on some Atari RL. I plugged it into CleanRL PPO with defaults and same LR as adam and it did quite well, check out some graphs! W&B report: api.wandb.ai/links/evanat...

December 11, 2024 at 4:19 PM

Evan Walters

@evanatyourservice.bsky.social

I implemented mLSTM from the xLSTM paper from @HochreiterSepp and team in JAX, it can more or less be used in place of attention. Haven't done a lot of experiments with it yet, if you give it a try please report back!

github.com/evanatyourse...

GitHub - evanatyourservice/xLSTM-JAX: An implementation of mLSTM from xLSTM in JAX

An implementation of mLSTM from xLSTM in JAX. Contribute to evanatyourservice/xLSTM-JAX development by creating an account on GitHub.

github.com

November 30, 2024 at 4:43 PM

Reposted by Evan Walters

Omead Pooladzandi ✈️ NeurIPS'24

@hessianfree.bsky.social

Hi @clementpoiret.bsky.social I am one of the co-authors of PSGD from 2022, and actively working on PSGD Kron with Xilin and @evanatyourservice.bsky.social glad you are excited about PSGD Kron!

Omead Pooladzandi ✈️ NeurIPS'24 @hessianfree.bsky.social · Nov 26

PSGD ❤️ MARS

MARS is a new exciting variance reduction technique from @quanquangu.bsky.social 's group which can help stabilize and accelerate your deep learning pipeline. All that is needed is a gradient buffer. Here MARS speeds up the convergence of PSGD ultimately leading to a better solution.

November 28, 2024 at 2:16 AM

Reposted by Evan Walters

Quanquan Gu

@quanquangu.bsky.social

Just put together a starter pack for Deep Learning Theory. Let me know if you'd like to be included or suggest someone to add to the list!

go.bsky.app/2qnppia

November 22, 2024 at 9:35 PM

Reposted by Evan Walters

Omead Pooladzandi ✈️ NeurIPS'24

@hessianfree.bsky.social

PSGD ❤️ MARS

MARS is a new exciting variance reduction technique from @quanquangu.bsky.social 's group which can help stabilize and accelerate your deep learning pipeline. All that is needed is a gradient buffer. Here MARS speeds up the convergence of PSGD ultimately leading to a better solution.

November 26, 2024 at 4:21 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news