Or, we can use an optimizer that normalizes the features implicitly. Somewhere in the grads are all features related to the dog, fur, body shapes, leash, context... If we whiten the gradient, we get closer to the model learning all the dog-related features equally.
December 11, 2024 at 6:47 PM
Or, we can use an optimizer that normalizes the features implicitly. Somewhere in the grads are all features related to the dog, fur, body shapes, leash, context... If we whiten the gradient, we get closer to the model learning all the dog-related features equally.
Only learning these high-variance features makes a poor model, though, because as soon as those features aren't visible it has trouble identifying the dog. We battle this explicitly by cropping pictures (data aug) so the model is forced to learn other parts of the dog.
December 11, 2024 at 6:47 PM
Only learning these high-variance features makes a poor model, though, because as soon as those features aren't visible it has trouble identifying the dog. We battle this explicitly by cropping pictures (data aug) so the model is forced to learn other parts of the dog.
If we think about it from the perspective of image classification and are trying to identify a dog, the most identifiable features are likely the face, ears, or tail, which have high variance. These high-variance features will stand out most in the gradient, so the model will mainly learn these.
December 11, 2024 at 6:47 PM
If we think about it from the perspective of image classification and are trying to identify a dog, the most identifiable features are likely the face, ears, or tail, which have high variance. These high-variance features will stand out most in the gradient, so the model will mainly learn these.
MARS is a new exciting variance reduction technique from @quanquangu.bsky.social 's group which can help stabilize and accelerate your deep learning pipeline. All that is needed is a gradient buffer. Here MARS speeds up the convergence of PSGD ultimately leading to a better solution.