Improving SGD-based Optimizers for Deep Learning

Rathinaraja Jeyaraj, Kyungpook National University
Shutterstock Deep Learning Graphic

Since the last decade, different techniques (momentum, RMSprop, Adam, etc.) for stochastic gradient descent (SGD) have been proposed to accelerate model convergence. In these methods, oscillation (jitter) in the outcome of every iteration is a key problem that could potentially slow down the convergence. However, training deep learning models on huge datasets demands fast convergence. Considering this, we proposed a technique that uses harmonic mean to minimize the oscillation and improve the convergence speed of classical SGD and other SGD-based optimizers.