Item

Semantic-guided Cross-Modal Prompt Learning for Skeleton-based Zero-shot Action Recognition

Zhu, Anqi
Zhu, Jingmin
Bailey, James
Gong, Mingming
Ke, Qiuhong
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Skeleton-based human action recognition is promising due to its privacy preservation, robustness to visual challenges, and computational efficiency. Especially, the practical necessity to recognize unseen actions has led to increased interest in zero-shot skeleton-based action recognition (ZSSAR). Existing ZSSAR approaches often rely on manually crafted action descriptions or visual assumptions to enhance knowledge transfer, which is limited in flexibility and prone to inaccuracies and noise. To overcome this, we introduce Semantic-guided Cross-Modal Prompt Learning (SCoPLe), a novel framework that replaces manual guidance with data-driven prompt learning for refinement and alignment of skeletal and textual features. Specifically, we introduce a dual-stream language prompting module that preserves the original semantic context from the pre-trained text encoder while still effectively tuning its ouput for ZSSAR task adaptation. We also introduce a joint-shaped prompting module that learns tuning for skeleton features and incorporate an adaptive visual representation sampler that leverages text semantics to strengthen the cross-modal prompting interactions during skeleton-to-text embedding projection. Experimental results on the NTU-RGB+D and PKU-MMD datasets demonstrate the state-of-the-art performance of our method in both ZSSAR and generalized ZSSAR scenarios.
Citation
A. Zhu, J. Zhu, J. Bailey, M. Gong, and Q. Ke, “Semantic-guided Cross-Modal Prompt Learning for Skeleton-based Zero-shot Action Recognition,” 2025.
Source
Proceedings of the Computer Vision and Pattern Recognition Conference
Conference
Computer Vision and Pattern Recognition Conference (CVPR), 2025
Keywords
Subjects
Source
Computer Vision and Pattern Recognition Conference (CVPR), 2025
Publisher
Computer Vision Foundation
DOI
Full-text link