DHeLlam: General-Purpose, Automatic Micro-Batch Co-Execution for Distributed LLM Training
Wang, Haiquan ; Ruan, Chaoyi ; He, Jia ; Ruan, Jiaqi ; Tang, Chengjie ; Ma, Xiaosong ; Li, Cheng
Wang, Haiquan
Ruan, Chaoyi
He, Jia
Ruan, Jiaqi
Tang, Chengjie
Ma, Xiaosong
Li, Cheng
Supervisor
Department
Computer Science
Embargo End Date
Type
Conference proceeding
Date
License
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
The growth of Large Language Models (LLMs) has necessitated large-scale distributed training. Highly optimized frameworks, however, suffer significant losses in MFU (Model FLOPS Utilization) due to communication. This paper introduces DHeLlam, a novel micro-structure inspired by DNA that significantly enhances the efficiency of LLM training. Central to DHeLlam is Strand Interleaving (SI), which treats the continuous stream of training micro-batches on a GPU as two interleaved strands. DHeLlam co-schedules their forward and backward passes using operator-level overlap profiling and a dynamic programming-based search. It enables the two strands to share model states and activation memory, requiring $<3 \%$ additional HBM space under common model configurations. To our best knowledge, DHeLlam is the first to co-execute two microbatches without requiring model replication. With its unique model folding design, DHeLlam seamlessly integrates with all forms of data and model parallelism, including the challenging pipeline parallelism (with a W-shaped pipeline). We evaluated DHeLlam training with the popular Llama and GPT dense models, plus the Phi Mixture of Expert (MoE) model, across 2 GPU clusters. Results show that it achieves 12-40% throughput (up to 58% MFU) and 5-24% throughput (up to 64% MFU) improvement on the 64-card A40 and A800 clusters respectively, significantly outperforming state-of-the-art methods.
Citation
Source
Conference
2025 IEEE 43rd International Conference on Computer Design (ICCD)
Keywords
46 Information and Computing Sciences, 4605 Data Management and Data Science
Subjects
Source
2025 IEEE 43rd International Conference on Computer Design (ICCD)
Publisher
IEEE
