Leena C Vankadara
leenacvankadara.bsky.social
Leena C Vankadara
@leenacvankadara.bsky.social
Lecturer @GatsbyUCL; Previously Applied Scientist @AmazonResearch; PhD @MPI-IS @UniTuebingen
Reposted by Leena C Vankadara
Stable model scaling with width-independent dynamics?

Thrilled to present 2 papers at #NeurIPS 🎉 that study width-scaling in Sharpness Aware Minimization (SAM) (Th 16:30, #2104) and in Mamba (Fr 11, #7110). Our scaling rules stabilize training and transfer optimal hyperparams across scales.

🧵 1/10
December 10, 2024 at 7:08 AM