Item

Improving Adversarial Training from the Perspective of Class-Flipping Distribution

Zhou, Dawei
Wang, Nannan
Liu, Tongliang
Gao, Xinbo
Supervisor
Department
Machine Learning
Embargo End Date
Type
Journal article
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Adversarial training has been proposed and widely recognized as a very effective method to defend against adversarial noise. However, the label flipping pattern on different classes still need deeper exploration to identify potential problems and assist in further enhancing robustness. In this work, we model the class-flipping distribution via statistical investigations and find this distribution reveals two shortcomings: the highly misleading category is present in the model's predictions for data in each class, and the trend in class flipping are significantly different across classes. Based on these observations, we propose a Class-Flipping-aware Adversarial Training (CFAT) method. On the one hand, we obtain the most misleading categories for the data in each class by counting the samples flipped to different wrong categories, and utilize them as the target to construct corresponding targeted adversarial samples, respectively. On the other hand, we take the proportions of samples flipped to the most misleading category as factors to scale the perturbation budgets of adversarial training samples for the data with corresponding classes. Experimental results on datasets with different class number validate the effectiveness of the proposed method.
Citation
D. Zhou, N. Wang, T. Liu and X. Gao, "Improving Adversarial Training From the Perspective of Class-Flipping Distribution," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 6, pp. 4330-4342, June 2025, doi: 10.1109/TPAMI.2025.3540200
Source
IEEE Transactions on Pattern Analysis and Machine Intelligence
Conference
Keywords
Training, Noise, Robustness, Accuracy, Training data, Predictive models, Data models, Airplanes, Perturbation methods, Hands
Subjects
Source
Publisher
IEEE
Full-text link