Item

Towards Low-Latency GPU-Aware Pub/Sub Communication for Real-Time Edge Computing

Kuan, Hao-En
Yang, Yung-Hsiang
Jiang, Zen-Mou
Shih, Chi-Sheng
Hung, Shih-Hao
Supervisor
Department
Computer Science
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Real-time Edge AI applications often require efficient GPU-based data processing and communication. Since the applications are typically highly modularized, publish-subscribe (pub/sub) pattern is widely used to deliver data among components. However, existing pub/sub middleware introduces significant latency due to redundant memory copies between GPU and host memory. To address this, we propose GPU-Aware Pub/Sub communication (GAPS), a universal solution that integrates shared CUDA memory with existing pub/sub middleware, such as Zenoh-pico and Iceoryx. GAPS minimizes data transfer latency by enabling GPU memory sharing between publishers and subscribers, eliminating unnecessary memory copies. In our work, we propose an independent shared CUDA memory manager that creates a shared CUDA memory pool for each topic during a topic's initialization. For fine-grained allocation from the pool, we modify Two-Level Segregated Fit (TLSF), a real-time dynamic memory allocator, making it processsafe and capable of managing GPU memory. Additionally, we develop PyGAPS, an extension that accelerates publications of PyTorch tensors, eliminating serialization overhead in AI-driven applications. Our evaluation demonstrates that GAPS significantly reduces end-to-end latency and improves throughput of simplified computer vision pipelines—by up to 1.5× in the segmentation task and 3.8× in the classification task—making it a robust solution for real-time Edge AI.
Citation
H. -E. Kuan, Y. -H. Yang, Z. -M. Jiang, C. -S. Shih and S. -H. Hung, "Towards Low-Latency GPU-Aware Pub/Sub Communication for Real-Time Edge Computing," 2025 IEEE 31st International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), Singapore, Singapore, 2025, pp. 79-89, doi: 10.1109/RTCSA66114.2025.00018
Source
2026 IEEE 31st International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)
Conference
31st International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)
Keywords
Pub/Sub Middleware, GPU, CUDA, Shared Memory, Real-Time Dynamic Memory Allocator
Subjects
Source
31st International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)
Publisher
IEEE
Full-text link