SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models
Mitchell, Margaret ; Al-Ali, Hamdan ; Attanasio, Giuseppe ; Baldini, Ioana ; Clinciu, Miruna ; Clive, Jordan ; Delobelle, Pieter ; Dey, Manan ; Dholem, Kaustubh ; Dill, Timm ... show 10 more
Mitchell, Margaret
Al-Ali, Hamdan
Attanasio, Giuseppe
Baldini, Ioana
Clinciu, Miruna
Clive, Jordan
Delobelle, Pieter
Dey, Manan
Dholem, Kaustubh
Dill, Timm
Author
Mitchell, Margaret
Al-Ali, Hamdan
Attanasio, Giuseppe
Baldini, Ioana
Clinciu, Miruna
Clive, Jordan
Delobelle, Pieter
Dey, Manan
Dholem, Kaustubh
Dill, Timm
Djanibekov, Amirbek
Djanibekov, Tair
Doughman, Jad
Dutt, Ritam
Forde, Jessica Zosa
Gala, Jay
Ghosh, Avijit
Hamilton, Sil
Holtermann, Carolin
Huang, Jerry
Kaffee, Lucie-Aimee
Kasera, Janavi
Laud, Tanmay
Lauscher, Anne
Lopez, Roberto Luis
Mansurov, Jonibek
Masoud, Maraim
Mukherjee, Sagnik
Mukhituly, Nurdaulet
Nangia, Nikita
Nie, Shangrui
Ovalle, Anaelia
Pistilli, Giada
Ploeger, Esther
Qin, Jeremy
Radev, Dragomir
Raheja, Vipul
Savoldi, Beatrice
Sharma, Shanya
Shen, Xudong
Stanczak, Karolina
Subramonian, Arjun
Sun, Kaiser
Szczechla, Eliza
Torrent, Tiago Timponi
Tunuguntla, Deepak
Villa-Cueva, Emilio
Viridiano, Marcelo
van Der War, Oskar
Yakefu, Adina
Yin, Kayo
Zhang, Mike
Zink, Sydney
Neveol, Aurelie
Talat, Zeerak
Al-Ali, Hamdan
Attanasio, Giuseppe
Baldini, Ioana
Clinciu, Miruna
Clive, Jordan
Delobelle, Pieter
Dey, Manan
Dholem, Kaustubh
Dill, Timm
Djanibekov, Amirbek
Djanibekov, Tair
Doughman, Jad
Dutt, Ritam
Forde, Jessica Zosa
Gala, Jay
Ghosh, Avijit
Hamilton, Sil
Holtermann, Carolin
Huang, Jerry
Kaffee, Lucie-Aimee
Kasera, Janavi
Laud, Tanmay
Lauscher, Anne
Lopez, Roberto Luis
Mansurov, Jonibek
Masoud, Maraim
Mukherjee, Sagnik
Mukhituly, Nurdaulet
Nangia, Nikita
Nie, Shangrui
Ovalle, Anaelia
Pistilli, Giada
Ploeger, Esther
Qin, Jeremy
Radev, Dragomir
Raheja, Vipul
Savoldi, Beatrice
Sharma, Shanya
Shen, Xudong
Stanczak, Karolina
Subramonian, Arjun
Sun, Kaiser
Szczechla, Eliza
Torrent, Tiago Timponi
Tunuguntla, Deepak
Villa-Cueva, Emilio
Viridiano, Marcelo
van Der War, Oskar
Yakefu, Adina
Yin, Kayo
Zhang, Mike
Zink, Sydney
Neveol, Aurelie
Talat, Zeerak
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Large Language Models (LLMs), the bedrock of many artificial intelligence (AI) applications, are known to reproduce social biases present in their training data. Yet resources to measure and control this issue are limited. Research identifying and mitigating stereotype biases have primarily been concentrated around English, lagging the rapid advancement of LLMs in multilingual settings. To help further advance the ability to address stereotype bias in AI systems, we introduce a new multilingual dataset: SHADES.1 Designed for examining culturally-specific stereotypes that may be learned by LLMs, SHADES includes over 300 stereotypes from 37 regions, translated across 16 languages and annotated with multiple features to aid multilingual stereotype analysis. All statements in all languages are paired with templates, to serve as a resource for unlimited generation of new evaluation data. We demonstrate the utility of the dataset in a series of exploratory evaluations that reveal significant differences in how stereotypes are recognized and reflected across models and languages.
Citation
M. Mitchell et al., “SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models,” vol. 1, pp. 11995–12041, Jun. 2025, doi: 10.18653/V1/2025.NAACL-LONG.600
Source
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Conference
2025 Conference of the North American Chapter of the Association for Computational Linguistics-NAACL
Keywords
Graph Contrastive Learning, Social Network Representation, Influencer Identification, Community Detection, Node Embedding, Pairwise Ranking, Deep Graph Models, Information Diffusion
Subjects
Source
2025 Conference of the North American Chapter of the Association for Computational Linguistics-NAACL
Publisher
Association for Computational Linguistics
