Loading...
Thumbnail Image
Item

Rabbitail: A Tail Latency-Aware Scheduler for Deep Learning Recommendation Systems with Hierarchical Embedding Storage

Wan, Hu
Huang, Yun
Bai, Shuhan
Sun, Xuan
Kuo, Tei-Wei
Xue, Chun Jason
Research Projects
Organizational Units
Journal Issue
Abstract
Deep learning-based recommendation systems are critical for many online platforms, but face challenges in managing large embedding tables while meeting strict latency requirements. This paper presents Rabbitail, a novel inference scheduler designed for recommendation systems utilizing hierarchical DRAM-SSD embedding storage. Rabbitail employs a cache-aware approach, classifying inferences into hit and miss categories based on embedding cache lookup results. This allows hit inferences to proceed immediately to top MLPs without waiting for slower SSD retrievals. For miss inferences, Rabbitail implements an on-demand embedding lookup strategy and a reordering mechanism to optimize SSD retrieval. Additionally, it uses dedicated resource allocation for prompt processing of miss queue MLP tasks and employs batch splitting to manage maximum execution times. Evaluations using real-world datasets demonstrate that Rabbitail significantly reduces end-to-end model inference tail latency, achieving a 53.7% lower p99 tail latency compared to the baseline while maintaining throughput.
Citation
H. Wan, Y. Huang, S. Bai, X. Sun, T.-W. Kuo, and C. J. Xue, “Rabbitail: A Tail Latency-Aware Scheduler for Deep Learning Recommendation Systems with Hierarchical Embedding Storage,” Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing, pp. 279–287, Mar. 2025, doi: 10.1145/3672608.3707945.
Source
Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing
Conference
SAC '25: Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing
Keywords
Recommender systems, Embedding, Solid-state drive (SSD)
Subjects
Source
SAC '25: Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing
Publisher
Association for Computing Machinery
Full-text link