Konstantin Mishchenko
konstmish.bsky.social
Konstantin Mishchenko
@konstmish.bsky.social
Research Scientist at Meta Paris
Code generation, math, optimization
Cool new result: random arcsine stepsize schedule accelerates gradient descent (no momentum!) on separable problems. The separable class is clearly very limited, and it remains unclear if acceleration using stepsizes is possible on general convex problems.
arxiv.org/abs/2412.05790
December 10, 2024 at 1:04 PM
It's a bit hard to say because this kind of results are still quite new, but one of the most recent papers on the topic, arxiv.org/abs/2410.16249, mentions a conjecture on the optimality of its 1/n^{log₂(1+√ 2)} (not for the last iterate though).
November 27, 2024 at 11:02 PM
Gradient Descent with large stepsizes converges faster than O(1/T) but it was only shown for the *best* iterate before. Cool to see new results showing we can also get an improvement for the last iterate:
arxiv.org/abs/2411.17668
I am still waiting to see a version with adaptive stepsizes though 👀
November 27, 2024 at 3:02 PM