Loading...
Thumbnail Image
Item

A Multi-Labeled Dataset for Indonesian Discourse: Examining Toxicity, Polarization, and Demographics Information

Susanto, Lucky
Wijanarko, Musa Izzanardi
Pratama, Prasetia Anugrah
Tang, Zilu
Akyas, Fariz
Hong, Traci
Idris, Ika Karlina
Aji, Alham Fikri
Wijaya, Derry Tanti
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
License
http://creativecommons.org/licenses/by/4.0/
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Online discourse is increasingly trapped in a vicious cycle where polarizing language fuelstoxicity and vice versa. Identity, one of the most divisive issues in modern politics, oftenincreases polarization. Yet, prior NLP research has mostly treated toxicity and polarization asseparate problems. In Indonesia, the world’s third-largest democracy, this dynamic threatens democratic discourse, particularly in online spaces. We argue that polarization and toxicity must be studied in relation to each other. To this end, we present a novel multi-label Indonesian dataset annotated for toxicity, polarization, and annotator demographic information. Benchmarking with BERT-base models and large language models (LLMs) reveals that polarization cues improve toxicity classification and vice versa. Including demographic context further enhances polarization classification performance.
Citation
L. Susanto, M.I. Wijanarko, P.A. Pratama, Z. Tang, F. Akyas, T. Hong, I.K. Idris, A.F. Aji, D.T. Wijaya, "A Multi-Labeled Dataset for Indonesian Discourse: Examining Toxicity, Polarization, and Demographics Information," 2025, pp. 18863-18890.
Source
Conference
Findings of the Association for Computational Linguistics: ACL 2025
Keywords
44 Human Society, 4408 Political Science, 47 Language, Communication and Culture
Subjects
Source
Findings of the Association for Computational Linguistics: ACL 2025
Publisher
Association for Computational Linguistics
Full-text link