Evan Walters
banner
evanatyourservice.bsky.social
Evan Walters
@evanatyourservice.bsky.social
ML/RL enthusiast, second-order optimization, plasticity, environmentalist
This way, even if the features with the highest variance aren't present, we can still identify the dog using the less overt features.
December 11, 2024 at 6:47 PM
Or, we can use an optimizer that normalizes the features implicitly. Somewhere in the grads are all features related to the dog, fur, body shapes, leash, context... If we whiten the gradient, we get closer to the model learning all the dog-related features equally.
December 11, 2024 at 6:47 PM
Only learning these high-variance features makes a poor model, though, because as soon as those features aren't visible it has trouble identifying the dog. We battle this explicitly by cropping pictures (data aug) so the model is forced to learn other parts of the dog.
December 11, 2024 at 6:47 PM
If we think about it from the perspective of image classification and are trying to identify a dog, the most identifiable features are likely the face, ears, or tail, which have high variance. These high-variance features will stand out most in the gradient, so the model will mainly learn these.
December 11, 2024 at 6:47 PM
Reposted by Evan Walters
Hi @clementpoiret.bsky.social I am one of the co-authors of PSGD from 2022, and actively working on PSGD Kron with Xilin and @evanatyourservice.bsky.social glad you are excited about PSGD Kron!
PSGD ❤️ MARS

MARS is a new exciting variance reduction technique from @quanquangu.bsky.social 's group which can help stabilize and accelerate your deep learning pipeline. All that is needed is a gradient buffer. Here MARS speeds up the convergence of PSGD ultimately leading to a better solution.
November 28, 2024 at 2:16 AM
woah
November 26, 2024 at 4:32 AM
Thanks Zhipeng, glad to be a part!
November 26, 2024 at 4:06 AM