Abstract
This paper studies some asymptotic properties of adaptive algorithms widely used in optimization and machine learning, and among them Adagrad and Rmsprop, which are involved in most of the blackbox deep learning algorithms. Our setup is the non-convex landscape optimization point of view, we consider a one time scale parametrization and we consider the situation where these algorithms may be used or not with mini-batches. We adopt the point of view of stochastic algorithms and establish the almost sure convergence of these methods when using a decreasing step-size towards the set of critical points of the target function. With a mild extra assumption on the noise, we also obtain the convergence towards the set of minimizers of the function. Along our study, we also obtain a \convergence rate" of the methods, in the vein of the works of [GL13].
Keywords
Stochastic optimization; Stochastic adaptive algorithm; Convergence of random variables;
Replaced by
Sébastien Gadat, and Ioana Gavra, “Asymptotic study of stochastic adaptive algorithm in non-convex landscape”, Journal of Machine Learning Research, n. 228, August 2022, pp. 1–54.
Reference
Sébastien Gadat, and Ioana Gavra, “Asymptotic study of stochastic adaptive algorithm in non-convex landscape”, TSE Working Paper, n. 21-1175, January 2021.
See also
Published in
TSE Working Paper, n. 21-1175, January 2021