https://uuujf.github.io
ppl talking about implicit regularization, but how good is it? We show it's surprisingly effective: GD dominates ridge for linear regression, w/ more cool stuff on GD vs SGD
arxiv.org/abs/2509.17251
ppl talking about implicit regularization, but how good is it? We show it's surprisingly effective: GD dominates ridge for linear regression, w/ more cool stuff on GD vs SGD
arxiv.org/abs/2509.17251
ppl talking about implicit regularization, but how good is it? We show it's surprisingly effective: GD dominates ridge for linear regression, w/ more cool stuff on GD vs SGD
arxiv.org/abs/2509.17251
www.simonsfoundation.org/2025/07/11/m...
www.simonsfoundation.org/2025/07/11/m...
📅When: Mon, June 30 | 16:00 CET
What: Fireside chat w/ Peter Bartlett & Vitaly Feldman on communicating a research agenda, followed by mentorship roundtable to practice elevator pitches & mingle w/ COLT community!
let-all.com/colt25.html
📅When: Mon, June 30 | 16:00 CET
What: Fireside chat w/ Peter Bartlett & Vitaly Feldman on communicating a research agenda, followed by mentorship roundtable to practice elevator pitches & mingle w/ COLT community!
let-all.com/colt25.html
Classical GD analysis assumes small stepsizes for stability. However, in practice, GD is often used with large stepsizes, which lead to instability.
See my slides for more details on this topic: uuujf.github.io/postdoc/wu20...
Classical GD analysis assumes small stepsizes for stability. However, in practice, GD is often used with large stepsizes, which lead to instability.
See my slides for more details on this topic: uuujf.github.io/postdoc/wu20...
Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression
https://arxiv.org/abs/2506.02336
Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression
https://arxiv.org/abs/2506.02336
📝 Soliciting abstracts/posters exploring theoretical & practical aspects of post-training and RL with language models!
🗓️ Deadline: May 19, 2025
📝 Soliciting abstracts/posters exploring theoretical & practical aspects of post-training and RL with language models!
🗓️ Deadline: May 19, 2025
(Recording will be available eventually)
(Recording will be available eventually)
I'm late for #ICLR2025 #NAACL2025, but in time for #AISTATS2025 #ICML2025! 1/3
kamathematics.wordpress.com/2025/05/01/t...
I'm late for #ICLR2025 #NAACL2025, but in time for #AISTATS2025 #ICML2025! 1/3
kamathematics.wordpress.com/2025/05/01/t...
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
https://arxiv.org/abs/2504.04105
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
https://arxiv.org/abs/2504.04105
simons.berkeley.edu/workshops/future-language-models-transformers
simons.berkeley.edu/workshops/future-language-models-transformers