Heavy-tail behaviour of SGD - Part 2

Event

Weekly OSI Lab Seminar

Short summary

This seminar continues on from Part 1, focusing more on the implication of such heavy-tailed theories of SGD to generalization capability of neural nets, and the origin of the heavy-tailedness.

Abstract

Part 1 was about introducing reasons of why heavy-tailed model should be used to describe the SGD dynamics. This seminar(Part 2) starts by giving a from-ground necessary mathematical background, then moves onto discussing where the heavy-tailedness comes from. If time allows, we shall also look at how Hausdorff dimension of the path of SGD is directly related to the generalization capability. On the way, various papers (ICML2019, ICML2020, NeurIPS2020, arXiv) will be briefly introduced.

Papers

Papers discussed in the seminar:

Main: Mert Gürbüzbalaban, Umut Şimşekli, and Lingjiong Zhu. The Heavy-Tail Phenomenon in SGD. In arXiv 2020.
Main: Umut Şimşekli, Ozan Sener, George Deligiannidis, and Murat A. Erdogdu. Hausdorff Dimension, Stochastic Differential Equations, and Generalization in Neural Networks. In NeurIPS 2020. (Spotlight paper!)
Umut Şimşekli, Mert Gürbüzbalaban, Thanh Huy Nguyen, Gaël Richard, and Levent Sagun. On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks. In arXiv 2019.