Cool new result: random arcsine stepsize schedule accelerates gradient descent (no momentum!) on separable problems. The separable class is clearly very limited, and it remains unclear if acceleration using stepsizes is possible on general convex problems. arxiv.org/abs/2412.05790
December 10, 2024 at 1:04 PM
Cool new result: random arcsine stepsize schedule accelerates gradient descent (no momentum!) on separable problems. The separable class is clearly very limited, and it remains unclear if acceleration using stepsizes is possible on general convex problems. arxiv.org/abs/2412.05790
The idea that one needs to know a lot of advanced math to start doing research in ML seems so wrong to me. Instead of reading books for weeks and forgetting most of them a year later, I think it's much better to try do things, see what knowledge gaps prevent you from doing them, and only then read.
December 6, 2024 at 2:26 PM
The idea that one needs to know a lot of advanced math to start doing research in ML seems so wrong to me. Instead of reading books for weeks and forgetting most of them a year later, I think it's much better to try do things, see what knowledge gaps prevent you from doing them, and only then read.
Gradient Descent with large stepsizes converges faster than O(1/T) but it was only shown for the *best* iterate before. Cool to see new results showing we can also get an improvement for the last iterate: arxiv.org/abs/2411.17668 I am still waiting to see a version with adaptive stepsizes though 👀
November 27, 2024 at 3:02 PM
Gradient Descent with large stepsizes converges faster than O(1/T) but it was only shown for the *best* iterate before. Cool to see new results showing we can also get an improvement for the last iterate: arxiv.org/abs/2411.17668 I am still waiting to see a version with adaptive stepsizes though 👀