Inference-Time Selective Debiasing to Enhance Fairness in Text Classification Models
Kuzmin, Gleb ; Yadav, Neemesh ; Smirnov, Ivan ; Baldwin, Timothy ; Shelmanov, Artem
Kuzmin, Gleb
Yadav, Neemesh
Smirnov, Ivan
Baldwin, Timothy
Shelmanov, Artem
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
We propose selective debiasing – an inference-time safety mechanism designed to enhance the overall model quality in terms of prediction performance and fairness, especially in scenarios where retraining the model is impractical. The method draws inspiration from selective classification, where at inference time, predictions with low quality, as indicated by their uncertainty scores, are discarded. In our approach, we identify the potentially biased model predictions and, instead of discarding them, we remove bias from these predictions using LEACE – a post-processing debiasing method. To select problematic predictions, we propose a bias quantification approach based on KL divergence, which achieves better results than standard uncertainty quantification methods. Experiments on text classification datasets with encoder-based classification models demonstrate that selective debiasing helps to reduce the performance gap between post-processing methods and debiasing techniques from the at-training and pre-processing categories.
Citation
G. Kuzmin, N. Yadav, I. Smirnov, T. Baldwin, and A. Shelmanov, “Inference-Time Selective Debiasing to Enhance Fairness in Text Classification Models,” 2025. Accessed: May 05, 2025. [Online]. Available: https://aclanthology.org/2025.naacl-short.9/
Source
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies
Conference
NAACL 2025
Keywords
Subjects
Source
NAACL 2025
Publisher
Association for Computational Linguistics
