Item

Global-QSGD: Allreduce-Compatible Quantization for Distributed Learning with Theoretical Guarantees

Xin, Jihao
Canini, Marco
Richtarik, Peter
Horvath, Samuel
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Distributed training enables large-scale deep learning, but suffers from high communication overhead, especially as models and datasets grow. Gradient compression, particularly quantization, is a promising approach to mitigate this bottleneck. However, existing quantization schemes are often incompatible with Allreduce, the dominant communication primitive in distributed deep learning, and many prior solutions rely on heuristics without theoretical guarantees. We introduce Global-QSGD, an Allreduce-compatible gradient quantization method that leverages global norm scaling to reduce communication overhead while preserving accuracy. Global-QSGD is backed by rigorous theoretical analysis, extending standard unbiased compressor frameworks to establish formal convergence guarantees. Additionally, we develop a performance model to evaluate its impact across different hardware configurations. Extensive experiments on NVLink, PCIe, and large-scale cloud environments show that Global-QSGD accelerates distributed training by up to 3.51× over baseline quantization methods, making it a practical and efficient solution for large-scale deep learning workloads.
Citation
J. Xin and M. Canini, “Global-QSGD: Allreduce-Compatible Quantization for Distributed Learning with Theoretical Guarantees,” EuroMLSys 2025 - Proceedings of the 2025 5th Workshop on Machine Learning and Systems, pp. 216–229, Apr. 2025, doi: 10.1145/3721146.3721932
Source
Proceedings of the 5th Workshop on Machine Learning and Systems
Conference
EuroMLSys '25: Proceedings of the 5th Workshop on Machine Learning and Systems
Keywords
Distributed Training, Gradient Compression, Collective Communication
Subjects
Source
EuroMLSys '25: Proceedings of the 5th Workshop on Machine Learning and Systems
Publisher
Association for Computing Machinery
Full-text link