Events

National Center for Supercomputing Applications master calendar

View Full Calendar

NCSA staff who would like to submit an item for the calendar can email newsdesk@ncsa.illinois.edu.

DAIS Seminar: Dr. Quanquan Gu, "Unleashing the power of variance reduction for training large models."

Event Type
Seminar/Symposium
Sponsor
Prof. Jiawei Han and Prof. Chengxiang Zhai
Location
4124 Siebel Center
Virtual
Join online
Date
Sep 17, 2025   11:00 am - 12:30 pm  
Speaker
Dr. Quanquan Gu
Contact
Allison Mette
E-Mail
agk@illinois.edu
Phone
217-300-0256
Views
151
Originating Calendar
Siebel School Speakers Calendar

Abstract: Training deep neural networks and large language models demands efficient and scalable optimizers. Adaptive gradient algorithms like Adam, AdamW, and their variants have been central to this task. Despite the development of numerous variance reduction algorithms in the past decade aimed at accelerating stochastic optimization in both convex and nonconvex settings, variance reduction has not found widespread success in training deep neural networks or large language models. Consequently, it has remained a less favored approach in modern AI. In this talk, I will introduce a unified optimization framework, MARS (Make vAriance Reduction Shine), which reconciles preconditioned gradient methods with variance reduction via a scaled stochastic recursive momentum technique. Within this framework, I will introduce three instances of MARS that leverage preconditioned gradient updates based on AdamW, Lion, and Shampoo, respectively. In addition, I will draw a connection between our algorithms and existing optimizers. Experimental results on training GPT-2 models indicate that MARS consistently outperforms AdamW by a large margin.

Bio: Quanquan Gu is an Associate Professor of Computer Science at UCLA. His research is in artificial intelligence and machine learning, with a focus on nonconvex optimization, deep learning, reinforcement learning, large language models, and deep generative models. He received his Ph.D. degree in Computer Science from the University of Illinois at Urbana-Champaign in 2014. He is a recipient of the Sloan Research Fellowship, NSF CAREER Award, Simons Berkeley Research Fellowship among other industrial research awards.

link for robots only