Lightnews — Scholar-powered news

Reposted by Volkan Cevher

Tony S.F.

@tonysf.bsky.social

We also provide the first convergence rate analysis that I'm aware of for stochastic unconstrained Frank-Wolfe (i.e., without weight decay), which directly covers the muon optimizer (and much more)!

Volkan Cevher @cevherlions.bsky.social · Feb 13

🔥 Want to train large neural networks WITHOUT Adam while using less memory and getting better results? ⚡
Check out SCION: a new optimizer that adapts to the geometry of your problem using norm-constrained linear minimization oracles (LMOs): 🧵👇

February 13, 2025 at 4:59 PM

Volkan Cevher

@cevherlions.bsky.social

🔥 Want to train large neural networks WITHOUT Adam while using less memory and getting better results? ⚡
Check out SCION: a new optimizer that adapts to the geometry of your problem using norm-constrained linear minimization oracles (LMOs): 🧵👇

February 13, 2025 at 4:51 PM

Volkan Cevher

@cevherlions.bsky.social

It was a fun panel. Quite informative.

EPFL AI Center @epfl-ai-center.bsky.social · Feb 13

A thought-provoking panel with Scarlet of the EPFL AI Center, @cevherlions.bsky.social and Thomas Schneider from OFCOM - looking at the state of regulations, the business case for GenAI & the opportunities for Swiss research & innovation... a fine balance between talent, data and hardware. #AMLD

February 13, 2025 at 3:24 PM

Volkan Cevher

@cevherlions.bsky.social

Timeo professores machinae discendi et dona ferentes.

Mathieu Alain @miniapeur.bsky.social · Jan 5

January 5, 2025 at 7:09 PM

Reposted by Volkan Cevher

Eugene Vinitsky 🍒

@eugenevinitsky.bsky.social

An illustrated guide to never learning anything

December 25, 2024 at 12:26 AM

Reposted by Volkan Cevher

Wanyun Xie

@wanyunxie.bsky.social

We'll present "SAMPa: Sharpness-Aware Minimization Parallelized" at #NeurIPS24 on Thursday! This is joint work with Thomas Pethick and Volkan Cevher.
📍 Find us at Poster #5904 from 16:30 in the West Ballroom.

December 11, 2024 at 4:23 PM

Reposted by Volkan Cevher

Moritz Haas

@mohaas.bsky.social

Stable model scaling with width-independent dynamics?

Thrilled to present 2 papers at #NeurIPS 🎉 that study width-scaling in Sharpness Aware Minimization (SAM) (Th 16:30, #2104) and in Mamba (Fr 11, #7110). Our scaling rules stabilize training and transfer optimal hyperparams across scales.

🧵 1/10

December 10, 2024 at 7:08 AM

Reposted by Volkan Cevher

Moritz Haas

@mohaas.bsky.social

This is joint work with wonderful collaborators @leenacvankadara.bsky.social , @cevherlions.bsky.social and Jin Xu during our time at Amazon.

🧵 10/10

arxiv.org

December 10, 2024 at 7:08 AM