Jörg Franke
jfranke.bsky.social
Jörg Franke
@jfranke.bsky.social
PhD student in the Machine Learning Lab at the University of Freiburg - Core Deep Learning Research with some applications in bio.
🧵5/5 - Come discuss with us at NeurIPS:
📍 East Exhibit Hall A-C
🎯 Poster #2803
⏰ Thu Dec 12, 11am-2pm PST
Or check our paper: arxiv.org/abs/2311.09058
Improving Deep Learning Optimization through Constrained Parameter Regularization
Regularization is a critical component in deep learning. The most commonly used approach, weight decay, applies a constant penalty coefficient uniformly across all parameters. This may be overly restr...
arxiv.org
December 9, 2024 at 3:28 PM
🧵4/5 - For example, when pretrain GPT2s, AdamCPR outperforms AdamW with the same budget or only requires 2/3 of the budget to reach the same score.
December 9, 2024 at 3:28 PM
🧵3/5 - CPR can be used with any gradient-based optimization algorithm, e.g. Adam. You can find our AdamCPR implementation at github.com/automl/CPR or via pip install pytorch-cpr
December 9, 2024 at 3:28 PM
🧵2/5 - We reformulate regularization as an inequality-constrained optimization problem which leads to a couple of benefits:
✅ Individual and dynamic weight regularization
✅ Outperforms weight decay
✅ No additional or fewer hyperparameters
✅ Minor or no runtime overhead
December 9, 2024 at 3:28 PM