Item

An Ethical Dataset from Real-World Interactions Between Users and Large Language Models

Kaneko, Masahiro
Bollegala, Danushka Tarupathi
Baldwin, Timothy
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Recent studies have demonstrated that Large Language Models (LLMs) have ethical-related problems such as social biases, lack of moral reasoning, and generation of offensive content. The existing evaluation metrics and methods to address these ethical challenges use datasets intentionally created by instructing humans to create instances including ethical problems. Therefore, the data does not sufficiently include comprehensive prompts that users actually provide when using LLM services in everyday contexts and outputs that LLMs generate. There may be different tendencies between unethical instances intentionally created by humans and actual user interactions with LLM services, which could result in a lack of comprehensive evaluation. To investigate the difference, we create Eagle1 datasets extracted from actual interactions between ChatGPT and users that exhibit social biases, opinion biases, toxicity, and immoral problems. Our experiments show that Eagle captures complementary aspects, not covered by existing datasets proposed for evaluation and mitigation. We argue that using both existing and proposed datasets leads to a more comprehensive assessment of the ethics.
Citation
M. Kaneko, D. Bollegala, and T. Baldwin, “An Ethical Dataset from Real-World Interactions Between Users and Large Language Models,” IJCAI International Joint Conference on Artificial Intelligence, pp. 9737–9745, 2025, doi: 10.24963/IJCAI.2025/1082
Source
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Conference
34th Internationa Joint Conference on Artificial Intelligence, IJCAI 2025
Keywords
Subjects
Source
34th Internationa Joint Conference on Artificial Intelligence, IJCAI 2025
Publisher
International Joint Conferences on Artificial Intelligence
Full-text link