Item

SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models

Mitchell, Margaret
Al-Ali, Hamdan
Attanasio, Giuseppe
Baldini, Ioana
Clinciu, Miruna
Clive, Jordan
Delobelle, Pieter
Dey, Manan
Dholem, Kaustubh
Dill, Timm
... show 10 more
Research Projects
Organizational Units
Journal Issue
Abstract
Large Language Models (LLMs), the bedrock of many artificial intelligence (AI) applications, are known to reproduce social biases present in their training data. Yet resources to measure and control this issue are limited. Research identifying and mitigating stereotype biases have primarily been concentrated around English, lagging the rapid advancement of LLMs in multilingual settings. To help further advance the ability to address stereotype bias in AI systems, we introduce a new multilingual dataset: SHADES.1 Designed for examining culturally-specific stereotypes that may be learned by LLMs, SHADES includes over 300 stereotypes from 37 regions, translated across 16 languages and annotated with multiple features to aid multilingual stereotype analysis. All statements in all languages are paired with templates, to serve as a resource for unlimited generation of new evaluation data. We demonstrate the utility of the dataset in a series of exploratory evaluations that reveal significant differences in how stereotypes are recognized and reflected across models and languages.
Citation
M. Mitchell et al., “SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models,” vol. 1, pp. 11995–12041, Jun. 2025, doi: 10.18653/V1/2025.NAACL-LONG.600
Source
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Conference
2025 Conference of the North American Chapter of the Association for Computational Linguistics-NAACL
Keywords
Graph Contrastive Learning, Social Network Representation, Influencer Identification, Community Detection, Node Embedding, Pairwise Ranking, Deep Graph Models, Information Diffusion
Subjects
Source
2025 Conference of the North American Chapter of the Association for Computational Linguistics-NAACL
Publisher
Association for Computational Linguistics
Full-text link