https://uuujf.github.io
ppl talking about implicit regularization, but how good is it? We show it's surprisingly effective: GD dominates ridge for linear regression, w/ more cool stuff on GD vs SGD
arxiv.org/abs/2509.17251
ppl talking about implicit regularization, but how good is it? We show it's surprisingly effective: GD dominates ridge for linear regression, w/ more cool stuff on GD vs SGD
arxiv.org/abs/2509.17251
Classical GD analysis assumes small stepsizes for stability. However, in practice, GD is often used with large stepsizes, which lead to instability.
See my slides for more details on this topic: uuujf.github.io/postdoc/wu20...
Classical GD analysis assumes small stepsizes for stability. However, in practice, GD is often used with large stepsizes, which lead to instability.
See my slides for more details on this topic: uuujf.github.io/postdoc/wu20...