TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language Models
Hanif, Asif ; Agro, Maha Tufail ; Shamshad, Fahad ; Nandakumar, Karthik
Hanif, Asif
Agro, Maha Tufail
Shamshad, Fahad
Nandakumar, Karthik
Supervisor
Department
Computer Vision
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Prompt learning has emerged as an efficient alternative to full fine-tuning for adapting large audio-language models (ALMs) to downstream tasks. While this paradigm enables scalable deployment via Prompt-as-a-Service frameworks, it also introduces a critical yet underexplored security risk of backdoor attacks. In this work, we present TrojanWave, the first backdoor attack tailored to the prompt-learning setting in frozen ALMs. Unlike prior audio backdoor methods that require training from scratch on full datasets, TrojanWave injects backdoors solely through learnable prompts, making it highly scalable and effective in few-shot settings. TrojanWave injects imperceptible audio triggers in both time and spectral domains to effectively induce targeted misclassification during inference. To mitigate this threat, we further propose TrojanWave-Defense, a lightweight prompt purification method that neutralizes malicious prompts without hampering the clean performance. Extensive experiments across 11 diverse audio classification benchmarks demonstrate the robustness and practicality of both the attack and defense. Our code is publicly available at https://asif-hanif.github.io/trojanwave/.
Citation
A. Hanif, M. T. Agro, F. Shamshad, and K. Nandakumar, “TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language Models,” Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp. 18628–18644, 2025, doi: 10.18653/V1/2025.EMNLP-MAIN.940
Source
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Conference
2025 Conference on Empirical Methods in Natural Language Processing
Keywords
Large Language Models, Safety-Capability Trade-Off, Adversarial Prompt Attacks, Interactive Safety Arena, Unified Scoring Metric, Leaderboard Design, Balanced Evaluation Framework, Responsible AI Benchmarking
Subjects
Source
2025 Conference on Empirical Methods in Natural Language Processing
Publisher
Association for Computational Linguistics
