I am a tenure-track assistant professor in the Toyota Technological Institute at Chicago (TTIC). My research focuses on machine learning and optimization, especially on deep learning theory. I received my PhD from the Computer Science Department at Princeton University in 2022, where I was advised by Prof. Sanjeev Arora. I was a postdoctoral fellow in Computer Science Department at Stanford University, working with Tengyu Ma. I did my undergraduate study at Yao Class(2013-2017), Tsinghua University. |
α-β indicates alphabetical author order, * indicates equal contribution
Do You Grok? A Theoretical Analysis on Grokking Modular Addition
Mohamad Amin Mohamadi, Zhiyuan Li, Lei Wu, Danica J. Sutherland
ICML 2024 (To appear)
Implicit Bias of AdamW: -Norm Constrained Optimization
Shuo Xie, Zhiyuan Li
ICML 2024 (To appear)
Bias via Global Convergence of Sharpness Minimization
Khashayar Gatmiry, Zhiyuan Li, Sashank J. Reddi, Stefanie Jegelka
ICML 2024 (To appear)
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Zhiyuan Li, Hong Liu, Denny Zhou, Tengyu Ma
ICLR 2024
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
Kaifeng Lyu, Jikai Jin, Zhiyuan Li, Simon Shaolei Du, Jason D. Lee, Wei Hu
ICLR 2024
The Marginal Value of Momentum for Small Learning Rate SGD
Runzhe Wang, Sadhika Malladi, Tianhao Wang, Kaifeng Lyu, Zhiyuan Li
ICLR 2024
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
Hong Liu, Zhiyuan Li, David Leo Wright Hall, Percy Liang, Tengyu Ma
ICLR 2024
Fast Equilibrium of SGD in Generic Situations
(α-β)Zhiyuan Li, Yi Wang, Zhiren Wang
ICLR 2024
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Kaiyue Wen, Zhiyuan Li, Tengyu Ma
NeurIPS 2023 (Oral)
The Inductive Bias of Flatness Regularization for Deep Matrix Factorization
Khashayar Gatmiry, Zhiyuan Li, Ching-Yao Chuang, Sashank Reddi, Tengyu Ma, Stefanie Jegelka
NeurIPS 2023
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
Hong Liu, Sang Michael Xie, Zhiyuan Li, Tengyu Ma
ICML 2023
Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing
Jikai Jin, Zhiyuan Li, Kaifeng Lyu, Simon S. Du, Jason D. Lee
ICML 2023
How Does Sharpness-Aware Minimization Minimize Sharpness?
Kaiyue Wen, Tengyu Ma, Zhiyuan Li
ICLR 2023
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
Zhiyuan Li*, Tianhao Wang*, Jason D. Lee, Sanjeev Arora
Neurips 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu, Zhiyuan Li, Sanjeev Arora
Neurips 2022
Fast Mixing of Stochastic Gradient Descent with Normalization and Weight Decay
Zhiyuan Li, Tianhao Wang, Dingli Yu
Neurips 2022
Robust Training of Neural Networks using Scale Invariant Architectures
Zhiyuan Li, Srinadh Bhojanapalli, Manzil Zaheer, Sashank J. Reddi, Sanjiv Kumar
ICML 2022 (Long Presentation)
Understanding Gradient Descent on Edge of Stability in Deep Learning
(α-β) Sanjeev Arora, Zhiyuan Li, Abhishek Panigrahi
ICML 2022
What Happens after SGD Reaches Zero Loss? –A Mathematical Framework
Zhiyuan Li, Tianhao Wang, Sanjeev Arora
ICLR 2022 (Spotlight)
Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias
Kaifeng Lyu*, Zhiyuan Li*, Runzhe Wang*, Sanjeev Arora
Neurips 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li, Sadhika Malladi, Sanjeev Arora
Neurips 2021
Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning
(α-β) Yaqi Duan, Chi Jin, Zhiyuan Li
ICML 2021
Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets? (slides)
Zhiyuan Li, Yi Zhang, Sanjeev Arora
ICLR 2021 (Oral)
Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning
(α-β) Zhiyuan Li, Yuping Luo, Kaifeng Lyu
ICLR 2021
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate (slides)
Zhiyuan Li*, Kaifeng Lyu*, Sanjeev Arora
Neurips 2020
Implicit Regularization and Convergence for Weight Normalization
Xiaoxia Wu, Edgar Dobriban, Tongzheng Ren, Shanshan Wu, Zhiyuan Li, Suriya Gunasekar, Rachel Ward, Qiang Liu
Neurips 2020
An Exponential Learning Rate Schedule For Deep Learning (5 min talk, slides)
Zhiyuan Li, Sanjeev Arora
ICLR 2020 (Spotlight)
Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks (5 min talk)
(α-β) Sanjeev Arora, Simon S. Du, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang, Dingli Yu
ICLR 2020 (Spotlight)
Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee (5 min talk)
(α-β) Wei Hu, Zhiyuan Li, Dingli Yu
ICLR 2020
On Exact Computation with an Infinitely Wide Neural Net
(α-β) Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang
Neurips 2019 (Spotlight)
Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets
Rohith Kuditipudi, Xiang Wang, Holden Lee, Yi Zhang, Zhiyuan Li, Wei Hu, Sanjeev Arora, Rong Ge
Neurips 2019
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks (slides)
(α-β) Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruosong Wang
ICML 2019
Theoretical Analysis of Auto Rate-Tuning by Batch Normalization
Sanjeev Arora, Zhiyuan Li, Kaifeng Lyu
ICLR 2019
The role of over-parametrization in generalization of neural networks
Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann LeCun, Nathan Srebro
ICLR 2019
Online Improper Learning with an Approximation Oracle
(α-β) Elad Hazan, Wei Hu, Yuanzhi Li, Zhiyuan Li
NeurIPS 2018
Stability of Generalized Two-sided Markets with Transaction Thresholds
(α-β) Zhiyuan Li, Yicheng Liu, Pingzhong Tang, Tingting Xu, Wei Zhan
AAMAS 2017 (Best Paper Award Nomination)
Solving Marginal MAP Problems with NP Oracles and Parity Constraints
Yexiang Xue, Zhiyuan Li, Stefano Ermon, Carla Gomes, Bart Selman
NIPS 2016
Learning in Games: Robustness of Fast Convergence
(α-β) Dylan Foster, Zhiyuan Li, Thodoris Lykouris, Karthik Sridharan, Eva Tardos
NIPS 2016
Enhanced Convolutional Neural Tangent Kernels
Zhiyuan Li*, Ruosong Wang*, Dingli Yu*, Simon S. Du, Wei Hu, Ruslan Salakhutdinov, Sanjeev Arora
Reviewers for JMLR, Machine Learning, NeurIPS, COLT, ALT, AISTATS, ICML, ICLR
Area Chair for NeurIPS 2023
Outstanding Nuerips Reviewer Award, 2021
Microsoft Research PhD Fellowship, 2020
The William G. Bowen Merit Fellowship, Princeton Univeristy, 2017