Gradient Descent on Infinitely Wide Neural Networks: Global Convergence and Generalization

Event

Weekly OSI Lab Seminar

Short summary

In this seminar, I will talk about the paper “Gradient Descent on Infinitely Wide Neural Networks: Global Convergence and Generalization” (Bach and Chizat, arXiv 2021) and the references therein.

Abstract

Many supervised machine learning methods are naturally cast as optimization problems. For prediction models which are linear in their parameters, this often leads to convex problems for which many mathematical guarantees exist. Models which are non-linear in their parameters such as neural networks lead to non-convex optimization problems for which guarantees are harder to obtain. In this review paper, we consider two-layer neural networks with homogeneous activation functions where the number of hidden neurons tends to infinity, and show how qualitative convergence guarantees may be derived.

Papers

Papers discussed in the seminar:

Main: Bach, Francis and Chizat, Lénaïc. Gradient Descent on Infinitely Wide Neural Networks: Global Convergence and Generalization. In arXiv 2021.
Chizat, Lénaïc and Bach, Francis. On the Global Convergence of Gradient Descent for Over-parametrized Models using Optimal Transport. In NeurIPS 2018.
Chizat, Lénaïc and Bach, Francis. Implicit Bias of Gradient Descent for Wide Two-Layer Neural Networks Trained with the Logistic Loss. In COLT 2020.
Chizat, Lénaïc and Bach, Francis. On Lazy Training in Differentiable Programming. In NeurIPS 2019.