Zhiyuan Li (李志远)

Office: TTIC 508

I am a tenure-track assistant professor at Toyota Technological Institute at Chicago (TTIC) and an affiliated faculty of Computer Science at the University of Chicago. I am also a visiting faculty at Google Research. Before joining TTIC, I was a postdoctoral fellow in Computer Science Department at Stanford University, working with Tengyu Ma. I received my PhD from the Computer Science Department at Princeton University in 2022, where I was advised by Sanjeev Arora. I did my undergraduate study at Yao Class, Tsinghua University.

I am broadly interested in machine learning theory, including optimization in deep learning, reasoning capabilities of Large Language Models (LLMs), modern paradigm of generalization in machine learning (overpatametrization, out-of-domain generalization) and its connection to the implicit bias of optimization algorithms.

News

Jun 23, 2025	Excited to co-organize and participate in Midwest Machine Learning Symposium at Uchicago!
Jun 09, 2025	Presenting PENCIL at 2025 Annual Meeting of IDEAL Institute.
May 05, 2025	Serving as Area Chair for NeurIPS 2025
May 01, 2025	4 papers accpected by ICML 2025!
Jan 31, 2025	Serving as Area Chair for ICML 2025

Selected and Recent publications

Structured Preconditioners in Adaptive Optimization: A Unified Analysis

Shuo Xie, Tianhao Wang, Sashank Reddi, Sanjiv Kumar, and Zhiyuan Li

In Proceedings of the 42nd International Conference on Machine Learning, ICML 2025

arXiv Bib

@inproceedings{xie2025structured,
  author = {Xie, Shuo and Wang, Tianhao and Reddi, Sashank and Kumar, Sanjiv and Li, Zhiyuan},
  title = {Structured Preconditioners in Adaptive Optimization: A Unified Analysis},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year = {2025},
}

PENCIL: Long Thoughts with Short Memory

Chenxiao Yang, Nathan Srebro, David McAllester, and Zhiyuan Li

In Proceedings of the 42nd International Conference on Machine Learning, ICML 2025

arXiv Bib

@inproceedings{yang2025pencil,
  author = {Yang, Chenxiao and Srebro, Nathan and McAllester, David and Li, Zhiyuan},
  title = {PENCIL: Long Thoughts with Short Memory},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year = {2025},
}

Weak-to-Strong Generalization Even in Random Feature Networks, Provably

Marko Medvedev, Kaifeng Lyu, Dingli Yu, Sanjeev Arora, Zhiyuan Li, and Nathan Srebro

In Proceedings of the 42nd International Conference on Machine Learning, ICML 2025

arXiv Bib

@inproceedings{medvedev2025weak,
  author = {Medvedev, Marko and Lyu, Kaifeng and Yu, Dingli and Arora, Sanjeev and Li, Zhiyuan and Srebro, Nathan},
  title = {Weak-to-Strong Generalization Even in Random Feature Networks, Provably},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year = {2025},
}

Non-Asymptotic Length Generalization

Thomas Chen, Tengyu Ma, and Zhiyuan Li

In Proceedings of the 42nd International Conference on Machine Learning, ICML 2025

arXiv Bib

@inproceedings{chen2025non,
  author = {Chen, Thomas and Ma, Tengyu and Li, Zhiyuan},
  title = {Non-Asymptotic Length Generalization},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year = {2025},
}

Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity

Shuo Xie, Mohamad Amin Mohamadi, and Zhiyuan Li

In The Thirteenth International Conference on Learning Representations, ICLR 2025

Spotlight arXiv Bib

2025 ICLR Spotlight

@inproceedings{xie2024adam,
  author = {Xie, Shuo and Mohamadi, Mohamad Amin and Li, Zhiyuan},
  title = {Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity},
  booktitle = {The Thirteenth International Conference on Learning Representations},
  year = {2025},
}

Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

Zhiyuan Li, Hong Liu, Denny Zhou, and Tengyu Ma

In The Twelfth International Conference on Learning Representations, ICLR 2024

arXiv Bib

@inproceedings{li2024chain,
  author = {Li, Zhiyuan and Liu, Hong and Zhou, Denny and Ma, Tengyu},
  title = {Chain of Thought Empowers Transformers to Solve Inherently Serial Problems},
  booktitle = {The Twelfth International Conference on Learning Representations},
  year = {2024},
}

How Does Sharpness-Aware Minimization Minimize Sharpness?

Kaiyue Wen, Tengyu Ma, and Zhiyuan Li

In The Eleventh International Conference on Learning Representations, ICLR 2023

arXiv Bib

@inproceedings{wen2023does,
  author = {Wen, Kaiyue and Ma, Tengyu and Li, Zhiyuan},
  title = {How Does Sharpness-Aware Minimization Minimize Sharpness?},
  booktitle = {The Eleventh International Conference on Learning Representations},
  year = {2023},
}

What Happens after SGD Reaches Zero Loss?--A Mathematical Framework

Zhiyuan Li, Tianhao Wang, and Sanjeev Arora

In The Tenth International Conference on Learning Representations, ICLR 2022

Spotlight arXiv Bib

2022 ICLR Spotlight

@inproceedings{li2021happens,
  author = {Li, Zhiyuan and Wang, Tianhao and Arora, Sanjeev},
  title = {What Happens after SGD Reaches Zero Loss?--A Mathematical Framework},
  booktitle = {The Tenth International Conference on Learning Representations},
  year = {2022},
}

Why Are Convolutional Nets More Sample-Efficient Than Fully-Connected Nets?

Zhiyuan Li, Yi Zhang, and Sanjeev Arora

In The Ninth International Conference on Learning Representations, ICLR 2021

Oral arXiv Bib

2021 ICLR Oral

@inproceedings{li2020convolutional,
  author = {Li, Zhiyuan and Zhang, Yi and Arora, Sanjeev},
  title = {Why Are Convolutional Nets More Sample-Efficient Than Fully-Connected Nets?},
  booktitle = {The Ninth International Conference on Learning Representations},
  year = {2021},
}