Item

MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

Xue, Haochen
Tang, Feilong
Hu, Ming
Liu, Yexin
Huang, Qidong
Li, Yulong
Liu, Chengzhi
Xu, Zhongxing
Zhang, Chong
Feng, Chunmei
... show 6 more
Research Projects
Organizational Units
Journal Issue
Abstract
Recent multimodal large language models (MLLMs) have demonstrated significant potential in open-ended conversation, generating more accurate and personalized responses. However, their abilities to memorize, recall, and reason in sustained interactions within real-world scenarios remain underexplored. This paper introduces MMRC, a Multi-Modal Real-world Conversation benchmark for evaluating six core open-ended abilities of MLLMs: information extraction, multi-turn reasoning, information update, image management, memory recall, and answer refusal. With data collected from real-world scenarios, MMRC comprises 5,120 conversations and 28,720 corresponding manually labeled questions, posing a significant challenge to existing MLLMs. Evaluations on 22 MLLMs in MMRC indicate an accuracy drop during open-ended interactions. We identify four common failure patterns: long-term memory degradation, inadequacies in updating factual knowledge, accumulated assumption of error propagation, and reluctance to “say no”. To mitigate these issues, we propose a simple yet effective NOTE-TAKING strategy, which can record key information from the conversation and remind the model during its responses, enhancing conversational capabilities. Experiments across six MLLMs demonstrate significant performance improvements.
Citation
H. Xue et al., “MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation,” vol. 1, pp. 22477–22503, Aug. 2025, doi: 10.18653/V1/2025.ACL-LONG.1096.
Source
Proceedings of the Annual Meeting of the Association for Computational Linguistics
Conference
63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Keywords
Multimodal Large Language Models, Real-World Conversation Benchmark, Information Extraction, Multi-turn Reasoning, Memory Recall, Image Management, Answer Refusal, Long-Term Interaction Evaluation
Subjects
Source
63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Publisher
Association for Computational Linguistics
Full-text link