Genesis: A Large-Scale Benchmark for Multimodal Large Language Model in Emotional Causality Analysis
Li, Yulong ; Zhang, Yuxuan ; Chen, Rui ; Tang, Feilong ; Lu, Zhixiang ; Hu, Ming ; Wu, Jianghao ; Xue, Haochen ; Zhou, Mian ; Li, Chong ... show 2 more
Li, Yulong
Zhang, Yuxuan
Chen, Rui
Tang, Feilong
Lu, Zhixiang
Hu, Ming
Wu, Jianghao
Xue, Haochen
Zhou, Mian
Li, Chong
Author
Supervisor
Department
Computational Biology
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Artificial intelligence will not achieve genuine empathy until models can reason about the causes of human emotions rather than only label them. Current datasets fail to support this objective, as existing emotional causality datasets primarily focus on textual modalities, lack non-verbal information such as speech and facial expressions, feature relatively short dialogue lengths, and limit research on long-term emotional evolution. Existing annotations concentrate on stimulus-response patterns and lack cross-temporal emotional causal chain annotations, failing to reveal how early events accumulate and ultimately trigger emotional changes. In this work, we introduce Genesis, the first multimodal dialogue dataset supporting long-term emotional causality analysis, which Genesis contains 1,000 dialogues averaging 208 turns each, spanning debate, family, educational, and social scenarios. Through two-layer annotation system: proximal cause identification and long-term causal chain tracking, Genesis labels complex emotional phenomena including cross-modal inconsistencies and long-distance causal dependencies. Our evaluation of 20 mainstream multimodal models reveals limitations in current approaches for long-term emotional causality. We propose Empathica as an evaluation baseline, employing a Recognition-Memory-Attribution architecture that integrates dynamic sliding windows and event aggregation mechanisms to address multimodal emotional causality modeling challenges. Empathica outperforms text-based models GPT-o1, and multimodal model Gemini 1.5 Pro and GPT-4o across all evaluation metrics.
Citation
Y. Li et al., “Genesis: A Large-Scale Benchmark for Multimodal Large Language Model in Emotional Causality Analysis,” Proceedings of the 33rd ACM International Conference on Multimedia, pp. 12651–12658, Oct. 2025, doi: 10.1145/3746027.3758202
Source
MM '25: Proceedings of the 33rd ACM International Conference on Multimedia
Conference
The 33rd ACM International Conference on Multimedia
Keywords
Emotional Causality Dataset, Long-term Multimodal Conversation Analysis, Long-term Causal Modeling, Causal Chain Annotation
Subjects
Source
The 33rd ACM International Conference on Multimedia
Publisher
Association for Computing Machinery
