Item

Prompting fairness: integrating causality to debias large language models

Li, Jingling
Tang, Zeyu
Liu, Xiaoyu
Spirtes, Peter
Zhang, Kun
Liu, Leqi
Liu, Yang
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Large language models (LLMs), despite their remarkable capabilities, are susceptible to generating biased and discriminatory responses. As LLMs increasingly influence high-stakes decision-making (e.g., hiring and healthcare), mitigating these biases becomes critical. In this work, we propose a causality-guided debiasing framework to tackle social biases, aiming to reduce the objectionable dependence between LLMs' decisions and the social information in the input. Our framework introduces a novel perspective to identify how social information can affect an LLM's decision through different causal pathways. Leveraging these causal insights, we outline principled prompting strategies that regulate these pathways through selection mechanisms. This framework not only unifies existing prompting-based debiasing techniques, but also opens up new directions for reducing bias by encouraging the model to prioritize fact-based reasoning over reliance on biased social cues. We validate our framework through extensive experiments on real-world datasets across multiple domains, demonstrating its effectiveness in debiasing LLM decisions, even with only black-box access to the model. © 2025 13th International Conference on Learning Representations, ICLR 2025. All rights reserved.
Citation
Y. Cui, S. Waqas Zamir, S. Khan, A. Knoll, M. Shah, and F. Shahbaz Khan, “AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation,” International Conference on Representation Learning, vol. 2025, pp. 101306–101327, May 2025
Source
13th International Conference on Learning Representations, ICLR 2025
Conference
13th International Conference on Learning Representations, ICLR 2025
Keywords
Subjects
Source
13th International Conference on Learning Representations, ICLR 2025
Publisher
International Conference on Learning Representations, ICLR
DOI
Full-text link