Scott Pesme
banner
skate-the-apple.bsky.social
Scott Pesme
@skate-the-apple.bsky.social
Postdoc at Inria Grenoble with Julien Mairal.
scottpesme.github.io
Voilà! This was a super fun project, and I'd be happy to discuss more with anyone interested. A huge thanks to my advisor Nicolas who was of great help all along the project.
An the bonus: an incremental Romain Gary!
November 19, 2024 at 4:53 PM
To sum up: we prove and characterise the saddle-to-saddle dynamics which occurs when training diagonal linear networks with vanishing initialisation. The visited saddles and jump times can be computed using a simple algorithm.
November 19, 2024 at 4:49 PM
Now if we have a look at the iterates, we observe that the coordinates successively activate one after another (bottom). From a loss landscape point of view, the iterates jump from a saddle point of the loss to another (right). Hence the name of "saddle-to-saddle" dynamics.
November 19, 2024 at 4:49 PM
Have you ever encountered similar plots? The training makes no progress, followed by a sharp transition where a new “feature” is learnt. And this occurs with a constant stepsize!
Spoiler: An incremental Romain Gary is hidden at the end of this thread...
arxiv.org/abs/2304.00488
November 19, 2024 at 4:49 PM
A (not at all) short presentation of the paper "(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability" which can be found on arxiv: arxiv.org/abs/2302.08982
November 19, 2024 at 4:23 PM