robertmgower.bsky.social
@robertmgower.bsky.social
We've just finished some work on improving the sensitivity of Muon to the learning rate, and exploring a lot of design choices. If you want to see how we did this, follow me ....1/x
arxiv.org/pdf/2510.09827
October 28, 2025 at 2:00 PM
Question on online learning theory. I needed a regret bound for online proximal point, where the loss is smooth. I've proven the below. I'm sure something like (1) (and maybe (2)) exist in the literature. But where? Who can I cite for these results? @neu-rips.bsky.social
April 16, 2025 at 2:38 PM