Stronger Separability, Stronger Defense: Influence-Based Backdoor Detection
Liu, Buhua ; Yang, Shuo ; Xu, Zhiqiang ; Xiong, Haoyi ; Cheung, Yiu-ming ; Xie, Zeke
Liu, Buhua
Yang, Shuo
Xu, Zhiqiang
Xiong, Haoyi
Cheung, Yiu-ming
Xie, Zeke
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Deep Neural Networks (DNNs) are susceptible to backdoor attacks, where an attacker can insert hidden functionality into DNNs by simply manipulating a small amount of training data, without compromising the victim DNN’s normal functionality. To defend against such attacks, one line of work focuses on detecting suspicious samples before training according to the latent separability assumption that clean and poison samples can be separated in representation space learned by a trained DNN. However, recent strong backdoor attacks can easily break the representation separability, thus existing detection methods become invalid. To this end, we propose to detect poison samples in influence space by tracing data influence on model parameters instead of conventional model outputs. We show that influence separability is significantly stronger than conventional representation separability in terms of four common statistics (e.g., Silhouette Score increases by 122% on average). With such strong separability in influence space, we can easily obtain stronger backdoor detection and defense by employing existing methods or even simple statistics in influence space. Extensive experiments show that our influence-based methods can significantly outperform conventional representation-based baselines against eight representative backdoor attacks. Particularly, influence space can surprisingly reduce the average attack success rate by 43.4 points (47.2%→3.8%) over three benchmark datasets than representation space. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
Citation
B. Liu, S. Yang, Z. Xu, H. Xiong, Y. Cheung, and Z. Xie, “Stronger Separability, Stronger Defense: Influence-Based Backdoor Detection,” pp. 108–120, 2025, doi: 10.1007/978-981-96-8170-9_9
Source
Lecture Notes in Computer Science
Conference
29th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2025
Keywords
Backdoor attack, Backdoor defense, Influence function
Subjects
Source
29th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2025
Publisher
Springer Nature
