Landscape and training regimes in deep learning

Event

Weekly OSI Lab Seminar

Short summary

In this seminar, I will talk about the paper “Landscape and training regimes in deep learning” (Geiger et al., 2021), accepted for publication in the Physics Report.

Abstract

(taken directly from the paper)

Deep learning algorithms are responsible for a technological revolution in a variety of tasks including image recognition or Go playing. Yet, why they work is not understood. Ultimately, they manage to classify data lying in high dimension – a feat generically impossible due to the geometry of high dimensional space and the associated curse of dimensionality. Understanding what kind of structure, symmetry or invariance makes data such as images learnable is a fundamental challenge. Other puzzles include that (i) learning corresponds to minimizing a loss in high dimension, which is in general not convex and could well get stuck bad minima. (ii) Deep learning predicting power increases with the number of fitting parameters, even in a regime where data are perfectly fitted. In this manuscript, we review recent results elucidating (i,ii) and the perspective they offer on the (still unexplained) curse of dimensionality paradox. We base our theoretical discussion on the (h,α) plane where h controls the number of parameters and α the scale of the output of the network at initialization, and provide new systematic measures of performance in that plane for two common image classification datasets. We argue that different learning regimes can be organized into a phase diagram. A line of critical points sharply delimits an under-parametrised phase from an over-parametrized one. In over-parametrized nets, learning can operate in two regimes separated by a smooth cross-over. At large initialization, it corresponds to a kernel method, whereas for small initializations features can be learnt, together with invariants in the data. We review the properties of these different phases, of the transition separating them and some open questions. Our treatment emphasizes analogies with physical systems, scaling arguments and the development of numerical observables to quantitatively test these results empirically. Practical implications are also discussed, including the benefit of averaging nets with distinct initial weights, or the choice of parameters (h,α) optimizing performance.

Papers

Papers discussed in the seminar:

Main: Mario Geiger, Leonardo Petrini, and Matthieu Wyart. Landscape and training regimes in deep learning. In Physics Report, 2021.