Lightnews — Scholar-powered news

Scott Pesme

@skate-the-apple.bsky.social

78 followers 130 following 31 posts

Postdoc at Inria Grenoble with Julien Mairal.
scottpesme.github.io

Posts Replies Media Videos

Scott Pesme

@skate-the-apple.bsky.social

Voilà! This was a super fun project, and I'd be happy to discuss more with anyone interested. A huge thanks to my advisor Nicolas who was of great help all along the project.
An the bonus: an incremental Romain Gary!

November 19, 2024 at 4:53 PM

Scott Pesme

@skate-the-apple.bsky.social

To sum up: we prove and characterise the saddle-to-saddle dynamics which occurs when training diagonal linear networks with vanishing initialisation. The visited saddles and jump times can be computed using a simple algorithm.

November 19, 2024 at 4:49 PM

Scott Pesme

@skate-the-apple.bsky.social

Now if we have a look at the iterates, we observe that the coordinates successively activate one after another (bottom). From a loss landscape point of view, the iterates jump from a saddle point of the loss to another (right). Hence the name of "saddle-to-saddle" dynamics.

November 19, 2024 at 4:49 PM

Scott Pesme

@skate-the-apple.bsky.social

Have you ever encountered similar plots? The training makes no progress, followed by a sharp transition where a new “feature” is learnt. And this occurs with a constant stepsize!
Spoiler: An incremental Romain Gary is hidden at the end of this thread...
arxiv.org/abs/2304.00488

November 19, 2024 at 4:49 PM

Scott Pesme

@skate-the-apple.bsky.social

A (not at all) short presentation of the paper "(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability" which can be found on arxiv: arxiv.org/abs/2302.08982

November 19, 2024 at 4:23 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news