Publications

* denotes equal contribution

(α-β) denotes alphabetical order

2025

  1. A Theory of Learning with Autoregressive Chain of Thought
    Nirmit Joshi, Gal Vardi, Adam Block, Surbhi Goel, Zhiyuan Li, Theodor Misiakiewicz, and Nathan Srebro
    In Proceedings of the 38th Conference on Learning Theory, COLT 2025
  2. Structured Preconditioners in Adaptive Optimization: A Unified Analysis
    Shuo Xie, Tianhao Wang, Sashank Reddi, Sanjiv Kumar, and Zhiyuan Li
    In Proceedings of the 42nd International Conference on Machine Learning, ICML 2025
  3. PENCIL: Long Thoughts with Short Memory
    Chenxiao Yang, Nathan Srebro, David McAllester, and Zhiyuan Li
    In Proceedings of the 42nd International Conference on Machine Learning, ICML 2025
  4. Weak-to-Strong Generalization Even in Random Feature Networks, Provably
    Marko Medvedev, Kaifeng Lyu, Dingli Yu, Sanjeev Arora, Zhiyuan Li, and Nathan Srebro
    In Proceedings of the 42nd International Conference on Machine Learning, ICML 2025
  5. Non-Asymptotic Length Generalization
    Thomas Chen, Tengyu Ma, and Zhiyuan Li
    In Proceedings of the 42nd International Conference on Machine Learning, ICML 2025
  6. Chain-of-Thought Provably Enables Learning the (Otherwise) Unlearnable
    Chenxiao Yang, Zhiyuan Li, and David Wipf
    In The Thirteenth International Conference on Learning Representations, ICLR 2025
  7. A Coefficient Makes SVRG Effective
    Yida Yin, Zhiqiu Xu, Zhiyuan Li, Trevor Darrell, and Zhuang Liu
    In The Thirteenth International Conference on Learning Representations, ICLR 2025
  8. Reasoning with Latent Thoughts: On the Power of Looped Transformers
    Nikunj Saunshi, Nishanth Dikkala, Zhiyuan Li, Sanjiv Kumar, and Sashank J Reddi
    In The Thirteenth International Conference on Learning Representations, ICLR 2025
  9. Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective
    Kaiyue Wen, Zhiyuan Li, Jason Wang, David Hall, Percy Liang, and Tengyu Ma
    In The Thirteenth International Conference on Learning Representations, ICLR 2025
  10. Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity
    Shuo Xie, Mohamad Amin Mohamadi, and Zhiyuan Li
    In The Thirteenth International Conference on Learning Representations, ICLR 2025

2024

  1. Implicit Bias of AdamW: $\ell_\infty$-Norm Constrained Optimization
    Shuo Xie and Zhiyuan Li
    In Proceedings of the 41st International Conference on Machine Learning, ICML 2024
  2. Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition
    Mohamad Amin Mohamadi, Zhiyuan Li, Lei Wu, and Danica J Sutherland
    In Proceedings of the 41st International Conference on Machine Learning, ICML 2024
  3. Simplicity Bias via Global Convergence of Sharpness Minimization
    Khashayar Gatmiry, Zhiyuan Li, Sashank J Reddi, and Stefanie Jegelka
    In International Conference on Machine Learning, ICML 2024
  4. Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
    Zhiyuan Li, Hong Liu, Denny Zhou, and Tengyu Ma
    In The Twelfth International Conference on Learning Representations, ICLR 2024
  5. Fast Equilibrium of SGD in Generic Situations
    Zhiyuan Li, Yi Wang, and Zhiren Wang (α-β)
    In The Twelfth International Conference on Learning Representations, ICLR 2024
  6. Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
    Kaifeng Lyu, Jikai Jin, Zhiyuan Li, Simon S Du, Jason D Lee, and Wei Hu
    In The Twelfth International Conference on Learning Representations, ICLR 2024
  7. Sophia: A Scalable Stochastic Second-Order Optimizer for Language Model Pre-Training
    Hong Liu, Zhiyuan Li, David Hall, Percy Liang, and Tengyu Ma
    In The Twelfth International Conference on Learning Representations, ICLR 2024
  8. The Marginal Value of Momentum for Small Learning Rate SGD
    Runzhe Wang, Sadhika Malladi, Tianhao Wang, Kaifeng Lyu, and Zhiyuan Li
    In The Twelfth International Conference on Learning Representations, ICLR 2024

2023

  1. What is the Inductive Bias of Flatness Regularization? A Study of Deep Matrix Factorization Models
    Khashayar Gatmiry, Zhiyuan Li, Tengyu Ma, Sashank Reddi, Stefanie Jegelka, and Ching-Yao Chuang
    In Advances in Neural Information Processing Systems 36, NeurIPS 2023
  2. Sharpness Minimization Algorithms Do Not Only Minimize Sharpness to Achieve Better Generalization
    Kaiyue Wen, Zhiyuan Li, and Tengyu Ma
    In Advances in Neural Information Processing Systems 36, NeurIPS 2023
  3. Understanding Incremental Learning of Gradient Descent: A Fine-Grained Analysis of Matrix Sensing
    Jikai Jin, Zhiyuan Li, Kaifeng Lyu, Simon Shaolei Du, and Jason D Lee
    In Proceedings of the 40th International Conference on Machine Learning, ICML 2023
  4. Same Pre-Training Loss, Better Downstream: Implicit Bias Matters for Language Models
    Hong Liu, Sang Michael Xie, Zhiyuan Li, and Tengyu Ma
    In Proceedings of the 40th International Conference on Machine Learning, ICML 2023
  5. How Does Sharpness-Aware Minimization Minimize Sharpness?
    Kaiyue Wen, Tengyu Ma, and Zhiyuan Li
    In The Eleventh International Conference on Learning Representations, ICLR 2023

2022

  1. Fast Mixing of Stochastic Gradient Descent with Normalization and Weight Decay
    Zhiyuan Li, Tianhao Wang, and Dingli Yu
    In Advances in Neural Information Processing Systems 35, NeurIPS 2022
  2. Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
    Zhiyuan Li*, Tianhao Wang*, Jason D Lee, and Sanjeev Arora
    In Advances in Neural Information Processing Systems 35, NeurIPS 2022
  3. Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
    Kaifeng Lyu, Zhiyuan Li, and Sanjeev Arora
    In Advances in Neural Information Processing Systems 35, NeurIPS 2022
  4. Bridging Theory and Practice in Deep Learning: Optimization and Generalization
    Zhiyuan Li
    Princeton University, PhD Thesis 2022
  5. Understanding Gradient Descent on the Edge of Stability in Deep Learning
    Sanjeev Arora, Zhiyuan Li, and Abhishek Panigrahi (α-β)
    In Proceedings of the 39th International Conference on Machine Learning, ICML 2022
  6. Robust Training of Neural Networks Using Scale Invariant Architectures
    Zhiyuan Li, Srinadh Bhojanapalli, Manzil Zaheer, Sashank Reddi, and Sanjiv Kumar
    In Proceedings of the 39th International Conference on Machine Learning, ICML 2022
  7. What Happens after SGD Reaches Zero Loss?--A Mathematical Framework
    Zhiyuan Li, Tianhao Wang, and Sanjeev Arora
    In The Tenth International Conference on Learning Representations, ICLR 2022

2021

  1. Gradient Descent on Two-Layer Nets: Margin Maximization and Simplicity Bias
    Kaifeng Lyu*, Zhiyuan Li*, Runzhe Wang*, and Sanjeev Arora
    In Advances in Neural Information Processing Systems 34, NeurIPS 2021
  2. When Is Particle Filtering Efficient For Planning In Partially Observed Linear Dynamical Systems?
    Simon S Du, Wei Hu, Zhiyuan Li, Ruoqi Shen, Zhao Song, and Jiajun Wu
    In Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence, UAI 2021
  3. Risk Bounds And Rademacher Complexity In Batch Reinforcement Learning
    Yaqi Duan, Chi Jin, and Zhiyuan Li (α-β)
    In Proceedings of the 38th International Conference on Machine Learning, ICML 2021
  4. On The Validity Of Modeling Sgd With Stochastic Differential Equations (SDEs)
    Zhiyuan Li, Sadhika Malladi, and Sanjeev Arora
    In Advances in Neural Information Processing Systems 34, NeurIPS 2021
  5. Towards Resolving The Implicit Bias Of Gradient Descent For Matrix Factorization: Greedy Low-Rank Learning
    Zhiyuan Li, Yuping Luo, and Kaifeng Lyu (α-β)
    In The Ninth International Conference on Learning Representations, ICLR 2021
  6. Why Are Convolutional Nets More Sample-Efficient Than Fully-Connected Nets?
    Zhiyuan Li, Yi Zhang, and Sanjeev Arora
    In The Ninth International Conference on Learning Representations, ICLR 2021

2020

  1. Implicit Regularization And Convergence For Weight Normalization
    Xiaoxia Wu, Edgar Dobriban, Tongzheng Ren, Shanshan Wu, Zhiyuan Li, Suriya Gunasekar, Rachel Ward, and Qiang Liu
    In Advances in Neural Information Processing Systems 33, NeurIPS 2020
  2. Reconciling Modern Deep Learning With Traditional Optimization Analyses: The Intrinsic Learning Rate
    Zhiyuan Li*, Kaifeng Lyu*, and Sanjeev Arora
    In Advances in Neural Information Processing Systems 33, NeurIPS 2020
  3. Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee
    Wei Hu, Zhiyuan Li, and Dingli Yu
    In The Eighth International Conference on Learning Representations, ICLR 2020
  4. Harnessing the power of infinitely wide deep nets on small-data tasks
    Sanjeev Arora, Simon S Du, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang, and Dingli Yu (α-β)
    In The Eighth International Conference on Learning Representations, ICLR 2020
  5. An Exponential Learning Rate Schedule for Deep Learning
    Zhiyuan Li and Sanjeev Arora
    In The Eighth International Conference on Learning Representations, ICLR 2020

2019

  1. Enhanced Convolutional Neural Tangent Kernels
    Zhiyuan Li*, Ruosong Wang*, Dingli Yu*, Simon S Du, Wei Hu, Ruslan Salakhutdinov, and Sanjeev Arora
    In arXiv preprint arXiv:1911.00809, arXiv 2019
  2. Explaining Landscape Connectivity of Low-Cost Solutions for Multilayer Nets
    Rohith Kuditipudi, Xiang Wang, Holden Lee, Yi Zhang, Zhiyuan Li, Wei Hu, Rong Ge, and Sanjeev Arora
    In Advances in Neural Information Processing Systems 32, NeurIPS 2019
  3. On Exact Computation with an Infinitely Wide Neural Net
    Sanjeev Arora, Simon S Du, Wei Hu, Zhiyuan Li, Russ R Salakhutdinov, and Ruosong Wang (α-β)
    In Advances in Neural Information Processing Systems 32, NeurIPS 2019
  4. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks
    Sanjeev Arora, Simon Du, Wei Hu, Zhiyuan Li, and Ruosong Wang (α-β)
    In Proceedings of the 36th International Conference on Machine Learning, ICML 2019
  5. Theoretical Analysis of Auto Rate-Tuning by Batch Normalization
    Sanjeev Arora, Zhiyuan Li, and Kaifeng Lyu (α-β)
    In The Seventh International Conference on Learning Representations, ICLR 2019
  6. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks
    Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann LeCun, and Nathan Srebro
    In The Seventh International Conference on Learning Representations, ICLR 2019

2018

  1. Online Improper Learning with an Approximation Oracle
    Elad Hazan, Wei Hu, Yuanzhi Li, and Zhiyuan Li (α-β)
    In Advances in Neural Information Processing Systems 31, NeurIPS 2018

2017

  1. Stability of Generalized Two-Sided Markets with Transaction Thresholds
    Zhiyuan Li, Yicheng Liu, Pingzhong Tang, Tingting Xu, and Wei Zhan (α-β)
    In Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2017

2016

  1. Learning in Games: Robustness of Fast Convergence
    Dylan J Foster, Zhiyuan Li, Thodoris Lykouris, Karthik Sridharan, and Eva Tardos (α-β)
    In Advances in Neural Information Processing Systems 29, NeurIPS 2016
  2. Solving Marginal MAP Problems with NP Oracles and Parity Constraints
    Yexiang Xue, Zhiyuan Li, Stefano Ermon, Carla P Gomes, and Bart Selman
    In Advances in Neural Information Processing Systems 29, NeurIPS 2016