Volkan Cevher's Avatar

Volkan Cevher

@cevherlions

Associate Professor of Electrical Engineering, EPFL. Amazon Scholar (AGI Foundations). IEEE Fellow. ELLIS Fellow.

984
Followers
104
Following
12
Posts
27.07.2023
Joined
Posts Following

Latest posts by Volkan Cevher @cevherlions

It turns out that the algorithm is closely related to the continuous greedy algorithm used in submodular optimization.

13.02.2025 17:04 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

We also provide the first convergence rate analysis that I'm aware of for stochastic unconstrained Frank-Wolfe (i.e., without weight decay), which directly covers the muon optimizer (and much more)!

13.02.2025 16:59 πŸ‘ 10 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

This is a joint work that I am very grateful to have worked on with the exceptionally talented team of Thomas Pethick, @wanyunxie.bsky.social, Kimon Antonakopoulos, Zhenyu Zhu at LIONS@EPFL and @tonysf.bsky.social from CentraleSupΓ©lec.

13.02.2025 16:51 πŸ‘ 3 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0
Post image

πŸ§‘β€πŸ³ We provide a complete cookbook for choosing the right LMO for your architecture: πŸ“š
- Input layers (1-hot vs image)
- Hidden layers (spectral norms)
- Output layers (flexible norm choices)
All with explicit formulas and guidance for when to use each one.

13.02.2025 16:51 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

🌟 It turns out many popular optimizers (SignSGD, Muon, etc.) are special cases of our framework - just with different norm choices.
Our unified analysis reveals deep connections between seemingly different approaches and provides new insights into why they work πŸ€”

13.02.2025 16:51 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸ“ Check out the preprint: arxiv.org/abs/2502.07529
Worst-case convergence analysis with rates, guarantees for learning rate transfer, and practical advice on how to properly choose norms adapted to network geometry, backed by theory 🎯

13.02.2025 16:51 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸ•΅οΈ It’s β€œjust” stochastic conditional gradient. The secret sauce? Don't treat your weight matrices like they're flat vectors! SCION adapts to the geometry of matrices using LMOs with respect to the correct norm: the induced operator norm.

13.02.2025 16:51 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Hyper-parameter transfer on NanonGPT.

Hyper-parameter transfer on NanonGPT.

arxiv.org/abs/2502.07529
πŸš€ Key results:
- Based on conditional gradient method
- Beats Muon+Adam on NanoGPT (tested up to 3B params)
- Zero-shot learning rate transfer across model size
- Uses WAY less memory (just one set of params + half-precision grads)
- Provides explicit norm control

13.02.2025 16:51 πŸ‘ 4 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸ”₯ Want to train large neural networks WITHOUT Adam while using less memory and getting better results? ⚑
Check out SCION: a new optimizer that adapts to the geometry of your problem using norm-constrained linear minimization oracles (LMOs): πŸ§΅πŸ‘‡

13.02.2025 16:51 πŸ‘ 18 πŸ” 6 πŸ’¬ 3 πŸ“Œ 1

It was a fun panel. Quite informative.

13.02.2025 15:24 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Timeo professores machinae discendi et dona ferentes.

05.01.2025 19:09 πŸ‘ 8 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Timeo professores machinae discendi et dona ferentes.

05.01.2025 19:08 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

An illustrated guide to never learning anything

25.12.2024 00:26 πŸ‘ 145 πŸ” 19 πŸ’¬ 6 πŸ“Œ 3

We'll present "SAMPa: Sharpness-Aware Minimization Parallelized" at #NeurIPS24 on Thursday! This is joint work with Thomas Pethick and Volkan Cevher.
πŸ“ Find us at Poster #5904 from 16:30 in the West Ballroom.

11.12.2024 16:23 πŸ‘ 1 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Post image Post image

Stable model scaling with width-independent dynamics?

Thrilled to present 2 papers at #NeurIPS πŸŽ‰ that study width-scaling in Sharpness Aware Minimization (SAM) (Th 16:30, #2104) and in Mamba (Fr 11, #7110). Our scaling rules stabilize training and transfer optimal hyperparams across scales.

🧡 1/10

10.12.2024 07:08 πŸ‘ 21 πŸ” 5 πŸ’¬ 1 πŸ“Œ 0

This is joint work with wonderful collaborators @leenacvankadara.bsky.social , @cevherlions.bsky.social and Jin Xu during our time at Amazon.

🧡 10/10

10.12.2024 07:08 πŸ‘ 3 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Preview
Authors Guidelines A peer review platform for the Association for Computational Linguistics

@iclr-conf.bsky.social: Please incorporate this ACL style of feedback for reviewers:

aclrollingreview.org/authors#step...

29.11.2024 17:45 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Reviewers take note:
57% of people rejected their own argument when they thought it was someone else's. So take it easy with the criticism.

15.11.2024 22:17 πŸ‘ 31 πŸ” 9 πŸ’¬ 0 πŸ“Œ 1