Unmasking Style Sensitivity: A Causal Analysis of Bias Evaluation Instability in Large Language Models
Zhao, Jiaxu ; Fang, Meng ; Zhang, Kun ; Pechenizkiy, Mykola
Zhao, Jiaxu
Fang, Meng
Zhang, Kun
Pechenizkiy, Mykola
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Natural language processing applications are increasingly prevalent, but social biases in their outputs remain a critical challenge. While various bias evaluation methods have been proposed, these assessments show unexpected instability when input texts undergo minor stylistic changes. This paper conducts a comprehensive analysis of how different style transformations impact bias evaluation results across multiple language models and bias types using causal inference techniques. Our findings reveal that formality transformations significantly affect bias scores, with informal style showing substantial bias reductions (up to 8.33% in LLaMA-2-13B). We identify appearance bias, sexual orientation bias, and religious bias as most susceptible to style changes, with variations exceeding 20%. Larger models demonstrate greater sensitivity to stylistic variations, with bias measurements fluctuating up to 3.1% more than in smaller models. These results highlight critical limitations in current bias evaluation methods and emphasize the need for reliable and fair assessments of language models.
Citation
J. Zhao, M. Fang, K. Zhang, and M. Pechenizkiy, “Unmasking Style Sensitivity: A Causal Analysis of Bias Evaluation Instability in Large Language Models,” 2025. [Online]. Available: https://aclanthology.org/2025.acl-long.796/
Source
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics
Conference
63rd Annual Meeting of the Association for Computational Linguistics, 2025
Keywords
Subjects
Source
63rd Annual Meeting of the Association for Computational Linguistics, 2025
Publisher
Association for Computational Linguistics
