Item

Learning Overspecified Gaussian Mixtures Exponentially Fast with the EM Algorithm

Assylbekov, Zhenisbek A.
Legg, Alan R.
Pak, Artur
Author
Assylbekov, Zhenisbek A., Legg, Alan R., Pak, Artur
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2026
License
Language
English
Research Projects
Organizational Units
Journal Issue
Abstract
We investigate the convergence properties of the EM algorithm when applied to overspecified Gaussian mixture models—that is, when the number of components in the fitted model exceeds that of the true underlying distribution. Focusing on a structured configuration where the component means are positioned at the vertices of a regular simplex and the mixture weights satisfy a non-degeneracy condition, we demonstrate that the population EM algorithm converges exponentially fast in terms of the Kullback-Leibler (KL) distance. Our analysis leverages the strong convexity of the negative log-likelihood function in a neighborhood around the optimum and utilizes the Polyak-Łojasiewicz inequality to establish that an ϵ-accurate approximation is achievable in O(log(1/ϵ)) iterations. Furthermore, we extend these results to a finite-sample setting by deriving explicit statistical convergence guarantees. Numerical experiments on synthetic datasets corroborate our theoretical findings, highlighting the dramatic acceleration in convergence compared to conventional sublinear rates. This work not only deepens the understanding of EM’s behavior in overspecified settings but also offers practical insights into initialization strategies and model design for high-dimensional clustering and density estimation tasks.
Citation
Z. Assylbekov, A. Legg, and A. Pak, “Learning Overspecified Gaussian Mixtures Exponentially Fast with the EM Algorithm,” pp. 327–343, 2026, doi: 10.1007/978-3-032-06078-5_19
Source
Lecture Notes in Computer Science
Conference
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2025
Keywords
Expectation-Maximization, Gaussian Mixtures, Overspecification
Subjects
Source
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2025
Publisher
Springer Nature
Full-text link