Tony S.F.
tonysf.bsky.social
Tony S.F.
@tonysf.bsky.social
Ass. Prof. of AI at CentraleSupélec in the Centre pour la Vision Numérique.
Our results are for any algorithm that fits the stochastic conditional gradient framework, which includes Muon notably but also normalized SGD, sign SGD, and others (e.g., greedy coordinate descent, low-rank stuff).
October 31, 2025 at 5:01 PM
Yep, none of this is affecting the loss - these regularizers are being added to the computation of the update to your parameters to better model the loss geometry, but they do not affect the loss you want to minimize (ignoring weight decay, which *does* transform unconstrained->constrained).
October 31, 2025 at 5:00 PM
if we ignore the fact that muon is doing adam on some parameters and just focus on the spectral update (thats what you compute with newton schulz) then it's a special case of Scion (which means you constrain the update to be in the spectral ball, blue in the picture).
October 30, 2025 at 11:15 PM
Not all DC algorithms I should say but CCCP is equivalent to Frank-Wolfe, proceedings.neurips.cc/paper_files/...
proceedings.neurips.cc
October 21, 2025 at 11:55 AM
Yeah, in this case it does change the stepsize (and therefore the dynamics) even if one assumption implies the other (this was what my collaborators told me when we were first writing our paper). I look forward to learning more about what these guys have done and how much a difference it makes.
October 21, 2025 at 10:19 AM
In our L0 L1 smooth work I kept lamenting that L0 L1 smooth on a compact set (like in Frank-Wolfe) implies L smoothness, so it's kind of a pointless assumption. But, if you did the math to derive the short-step, it would give a new, slightly tweaked step size. These guys did exactly that.
October 21, 2025 at 9:54 AM
Frankenstein by Shelley
October 20, 2025 at 8:53 AM
reminds me of "everyone steals, but i have taste!"
October 17, 2025 at 9:25 AM
Don’t most people use the word increasing in everyday life to mean strictly increasing? If your boss said your salary was increasing next year and then it stayed the same, wouldn’t you object to the use of increasing?
September 12, 2025 at 6:43 AM
Zed
July 22, 2025 at 7:59 PM