Clipping Improves Adam-Norm and AdaGrad-Norm when the Noise Is Heavy-Tailed
Chezhegov, Savelii A. ; Klyukin, Yaroslav ; Semenov, Andrei ; Beznosikov, Aleksandr N. ; Gasnikov, Alexander V. ; Horváth, Samuel ; Takáč, Martin ; Gorbunov, Eduard A.
Chezhegov, Savelii A.
Klyukin, Yaroslav
Semenov, Andrei
Beznosikov, Aleksandr N.
Gasnikov, Alexander V.
Horváth, Samuel
Takáč, Martin
Gorbunov, Eduard A.
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Methods with adaptive stepsizes, such as Ada-Grad and Adam, are essential for training modern Deep Learning models, especially Large LanguageModels. Typically, the noise in the stochastic gradients is heavy-tailed for the later ones. Gradient clipping provably helps to achieve good high-probability convergence for such noises. However, despite the similarity between Ada-Grad/Adam and Clip-SGD, the current understanding of the high-probability convergence of AdaGrad/Adam-type methods is limited in this case. In this work, we prove that AdaGrad/Adam (and their delayed version) can have provably bad high-probability convergence if the noise is heavy-tailed. We also show that gradient clipping fixes this issue, i.e., we derive new highprobability convergence bounds with polylogarithmic dependence on the confidence level for AdaGrad-Norm and Adam-Norm with clipping and with/without delay for smooth convex/nonconvexstochastic optimization with heavy-tailed noise. We extend our results to the case of Clip-AdaGrad/Clip-Adam with delayed stepsizes. Our empirical evaluations highlight the superiority of clipped versions of AdaGrad/Adam in handling the heavy-tailed noise.
Citation
S. Chezhegov et al., “Clipping Improves Adam-Norm and AdaGrad-Norm when the Noise Is Heavy-Tailed,” Oct. 06, 2025, PMLR. [Online]. Available: https://proceedings.mlr.press/v267/chezhegov25a.html
Source
Proceedings of Machine Learning Research
Conference
42nd International Conference on Machine Learning, ICML 2025
Keywords
Subjects
Source
42nd International Conference on Machine Learning, ICML 2025
Publisher
ML Research Press
