Heavy-tail behaviour of SGD - Part 1

Event

Weekly OSI Lab Seminar

Short summary

In this seminar, I will talk about a recent line of works that propose to analyze SGD under heavy-tail noise assumptions.

Abstract

One of the popular ways of analyzing the behavior of SGD and SGDm(SGD with momentum) is by considering it as a discretization of Langevin-type SDE. Up till 2019, it was widely assumed that the SGN has a finite variance, leading to the analysis of Brownian-driven SDE. Over the last 2 years, this finite variance assumption has been challenged (primarily by Prof. Umut Şimşekli and Prof. Mert Gürbüzbalaban) by claims that the SGN is actually heavy-tailed; as a consequence, the SDE of interest is actually driven by a Lévy motion. This talk gives a detailed overview of this new way of thinking about SGD/SGDm by going through some of the key papers(ICML2019, ICML2020, arXiv)+related papers.

Papers

Papers discussed in the talk:

Main: Umut Şimşekli, Levent Sagun, and Mert Gürbüzbalaban. A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks. In ICML 2019.
Umut Şimşekli. Fractional Langevin Monte Carlo: Exploring Lévy Driven Stochastic Differential Equations for Markov Chain Monte Carlo. In ICML 2017.
Umut Şimşekli, Lingjiong Zhu, Yee Whye Teh, and Mert Gürbüzbalaban. Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise. In ICML 2020.