Item

ZERODIFF: solidified visual-semantic correlation in zero-shot learning

Ye, Zihan
Gowda, Shreyank
Chen, Shiming
Huang, Xiaowei
Xu, Haotian
Khan, Fahad
Jin, Yaochu
Huang, Kaizhu
Jin, Xiaobo
Supervisor
Department
Computer Vision
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Zero-shot Learning (ZSL) aims to enable classifiers to identify unseen classes. This is typically achieved by generating visual features for unseen classes based on learned visual-semantic correlations from seen classes. However, most current generative approaches heavily rely on having a sufficient number of samples from seen classes. Our study reveals that a scarcity of seen class samples results in a marked decrease in performance across many generative ZSL techniques. We argue, quantify, and empirically demonstrate that this decline is largely attributable to spurious visual-semantic correlations. To address this issue, we introduce ZeroDiff, an innovative generative framework for ZSL that incorporates diffusion mechanisms and contrastive representations to enhance visual-semantic correlations. ZeroDiff comprises three key components: (1) Diffusion augmentation, which naturally transforms limited data into an expanded set of noised data to mitigate generative model overfitting; (2) Supervised-contrastive (SC)-based representations that dynamically characterize each limited sample to support visual feature generation; and (3) Multiple feature discriminators employing a Wasserstein-distance-based mutual learning approach, evaluating generated features from various perspectives, including pre-defined semantics, SC-based representations, and the diffusion process. Extensive experiments on three popular ZSL benchmarks demonstrate that ZeroDiff not only achieves significant improvements over existing ZSL methods but also maintains robust performance even with scarce training data. Our codes are available at https://github.com/FouriYe/ZeroDiff ICLR25. © 2025 13th International Conference on Learning Representations, ICLR 2025. All rights reserved.
Citation
Z. Ye et al., “ZeroDiff: Solidified Visual-semantic Correlation in Zero-Shot Learning,” International Conference on Representation Learning, vol. 2025, pp. 60471–60491, May 2025, Accessed: Jul. 22, 2025
Source
13th International Conference on Learning Representations, ICLR 2025
Conference
13th International Conference on Learning Representations, ICLR 2025
Keywords
Subjects
Source
13th International Conference on Learning Representations, ICLR 2025
Publisher
International Conference on Learning Representations, ICLR
DOI
Full-text link