Causality-aligned Prompt Learning via Diffusion-based Counterfactual Generation
Li, Xinshu ; Wang, Ruoyu ; Gao, Erdun ; Gong, Mingming ; Yao, Lina
Li, Xinshu
Wang, Ruoyu
Gao, Erdun
Gong, Mingming
Yao, Lina
Author
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Prompt learning has garnered attention for its efficiency over traditional model training and fine-tuning. However, existing methods, constrained by inadequate theoretical foundations, encounter difficulties in achieving causally invariant prompts, ultimately falling short of capturing robust features that generalize effectively across categories. To address these challenges, we introduce the DiCap model, a theoretically grounded Diffusion-based Counterfactual prompt learning framework, which leverages a diffusion process to iteratively sample gradients from the marginal and conditional distributions of the causal model, guiding the generation of counterfactuals that satisfy the minimal sufficiency criterion. Grounded in rigorous theoretical derivations, this approach guarantees the identifiability of counterfactual outcomes while imposing strict bounds on estimation errors. We further employ a contrastive learning framework that leverages the generated counterfactuals, thereby enabling the refined extraction of prompts that are precisely aligned with the causal features of the data. Extensive experimental results demonstrate that our method performs excellently across tasks such as image classification, image-text retrieval, and visual question answering, with particularly strong advantages in unseen categories.
Citation
X. Li, R. Wang, E. Gao, L. Yao, M. Gong, and L. 2025 Yao, “Causality-aligned Prompt Learning via Diffusion-based Counterfactual Generation,” Proceedings of the 33rd ACM International Conference on Multimedia, pp. 5208–5217, Oct. 2025, doi: 10.1145/3746027.3755820
Source
MM '25: Proceedings of the 33rd ACM International Conference on Multimedia
Conference
The 33rd ACM International Conference on Multimedia
Keywords
Vision Language Models, Prompt Learning, Diffusion Process, Counterfactual Generation
Subjects
Source
The 33rd ACM International Conference on Multimedia
Publisher
Association for Computing Machinery
