Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language Attack
Jia, Xiaojun ; Gao, Sensen ; Guo, Qing ; Qin, Simeng ; Ma, Ke ; Huang, Yihao ; Liu, Yang ; Tsang, Ivor ; Cao, Xiaochun
Jia, Xiaojun
Gao, Sensen
Guo, Qing
Qin, Simeng
Ma, Ke
Huang, Yihao
Liu, Yang
Tsang, Ivor
Cao, Xiaochun
Supervisor
Department
Computer Vision
Embargo End Date
Type
Journal article
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Vision-language pre-training (VLP) models excel at interpreting both images and text but remain vulnerable to multimodal adversarial examples (AEs). Advancing the generation of transferable AEs, which succeed across unseen models, is key to developing more robust and practical VLP models. Previous approaches augment image-text pairs to enhance diversity within the adversarial example generation process, aiming to improve transferability by expanding the contrast space of image-text features. However, these methods focus solely on diversity around the current AEs, yielding limited gains in transferability. To address this issue, we propose to increase the diversity of AEs by leveraging the intersection regions along the adversarial trajectory during optimization. Specifically, we propose sampling from adversarial evolution triangles composed of clean, historical, and current adversarial examples to enhance adversarial diversity. We provide a theoretical analysis to demonstrate the effectiveness of the proposed adversarial evolution triangle. Moreover, we find that redundant inactive dimensions can dominate similarity calculations, distorting feature matching and making AEs modeldependent with reduced transferability. Hence, we propose to generate AEs in the semantic image-text feature contrast space, which can project the original feature space into a semantic corpus subspace. The proposed semantic-aligned subspace can reduce the image feature redundancy, thereby improving adversarial transferability. Extensive experiments across different datasets and models demonstrate that the proposed method can effectively improve adversarial transferability and outperform state-of-the-art adversarial attack methods. © 1979-2012 IEEE.
Citation
X. Jia et al., "Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language Attack," in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2025.3581476.
Source
IEEE Transactions on Pattern Analysis and Machine Intelligence
Conference
Keywords
adversarial evolution triangle, Adversarial transferability, semantic-aligned, vision-language pretraining
Subjects
Source
Publisher
IEEE
