Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults
Although gradient descent with Polyak's momentum is widely used in modern machine and deep learning, a concrete understanding of its effects on the training trajectory remains …
prin-phunyaphibarn
