Item

Demographics and Democracy: Benchmarking LLMs’ Gender Bias and Political Leaning in European Parliament

Yang, Jinrui
Han, Xudong
Baldwin, Timothy
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
We introduce EuroParlVote, a novel benchmark for evaluating large language models (LLMs) in politically sensitive contexts. It links European Parliament debate speeches to roll-call vote outcomes and includes rich demographic metadata for each Member of the European Parliament (MEP), such as gender, age, country, and political group. Using EuroParlVote, we evaluate state-of-the-art LLMs on two tasks—gender classification and vote prediction—revealing consistent patterns of bias. We find that LLMs frequently misclassify female MEPs as male and demonstrate reduced accuracy when simulating votes for female speakers. Politically, LLMs tend to favor centrist groups while underperforming on both far-left and far-right ones. Proprietary models like GPT-4o outperform open-weight alternatives in terms of both robustness and fairness. We release the EuroParlVote dataset, code, and demo to support future research on fairness and accountability in NLP within political contexts
Citation
J. Yang, X. Han, and T. Baldwin, “Demographics and Democracy: Benchmarking LLMs’ Gender Bias and Political Leaning in European Parliament,” 2025. Accessed: Oct. 15, 2025. [Online]. Available: https://aclanthology.org/2025.icnlsp-1.41/
Source
Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP-2025)
Conference
8th International Conference on Natural Language and Speech Processing
Keywords
Subjects
Source
8th International Conference on Natural Language and Speech Processing
Publisher
Association for Computational Linguistics
DOI
Full-text link