Item

Jailbreaking the Non-Transferable Barrier via Test-Time Data Disguising

Xiang, Yongli
Hong, Ziming
Yao, Lina
Wang, Dadong
Liu, Tongliang
Supervisor
Department
Machine Learning
Embargo End Date
Type
Poster
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Non-transferable learning (NTL) has been proposed to protect model intellectual property (IP) by creating a "non-transferable barrier" to restrict generalization from authorized to unauthorized domains. Recently, well-designed attack, which restores the unauthorized-domain performance by fine-tuning NTL models on few authorized samples, highlights the security risks of NTL-based applications. However, such attack requires modifying model weights, thus being invalid in the black-box scenario. This raises a critical question: can we trust the security of NTL models deployed as black-box systems? In this work, we reveal the first loophole of black-box NTL models by proposing a novel attack method (dubbed as JailNTL) to jailbreak the non-transferable barrier through test-time data disguising, The main idea of JailNTL is to disguise unauthorized data so it can be identified as authorized by the NTL model, thereby bypassing the non-transferable barrier without modifying the NTL model weights. Specifically, JailNTL encourages unauthorized-domain disguising in two levels, including: (i) data-intrinsic disguising (DID) for eliminating domain discrepancy and preserving class-related content at the input-level, and (ii) model-guided disguising (MGD) for mitigating output-level statistics difference of the NTL model. Empirically, when attacking state-of-the-art (SOTA) NTL models in the black-box scenario, JailNTL achieves an accuracy increase of up to 54.3% in the unauthorized domain by using only 1% authorized samples, largely exceeding existing SOTA white-box attacks.
Citation
“CVPR Poster Jailbreaking the Non-Transferable Barrier via Test-Time Data Disguising.” [Online]. Available: https://cvpr.thecvf.com/virtual/2025/poster/35261
Source
CVPR
Conference
Keywords
Subjects
Source
Publisher
CVPR
DOI
Full-text link