Item

MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models

Zhou, Pengfei
Peng, Xiaopeng
Zhang, Fanrui
Xu, Zhaopan
Ai, Jiaxin
Qiu, Yansheng
Zhao, Wangbo
Song, Jiajun
Li, Chuanhao
Tang, Weidong
... show 10 more
Research Projects
Organizational Units
Journal Issue
Abstract
Multimodal large language models (MLLMs), which integrate language and visual cues for problem-solving, are crucial for advancing artificial general intelligence (AGI). However, current benchmarks for measuring the intelligence of MLLMs suffer from limited scale, narrow coverage, and unstructured knowledge, offering only static and undifferentiated evaluations. To bridge this gap, we introduce MDK12-Bench, a large-scale multidisciplinary benchmark built from real-world K–12 exams spanning six disciplines with 141K instances and 6,225 knowledge points organized in a six-layer taxonomy. Covering five question formats with difficulty and year annotations, it enables comprehensive evaluation to capture the extent to which MLLMs perform over four dimensions: 1) difficulty levels, 2) temporal (cross-year) shifts, 3) contextual shifts, and 4) knowledge-driven reasoning. We propose a novel dynamic evaluation framework that introduces unfamiliar visual, textual, and question form shifts to challenge model generalization while improving benchmark objectivity and longevity by mitigating data contamination. We further evaluate knowledge-point reference-augmented generation (KP-RAG) to examine the role of knowledge in reasoning. Key findings reveal limitations in current MLLMs in multiple aspects and provide guidance for enhancing model reasoning, robustness, and AI-assisted education.
Citation
P. Zhou, X. Peng, F. Zhang, Z. Xu, J. Ai, Y. Qiu , et al., "MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models," 2026, pp. 28982-28990.
Source
Proceedings of the AAAI Conference on Artificial Intelligence
Conference
The Fortieth AAAI Conference on Artificial Intelligence
Keywords
46 Information and Computing Sciences, 4602 Artificial Intelligence, 4605 Data Management and Data Science
Subjects
Source
The Fortieth AAAI Conference on Artificial Intelligence
Publisher
Association for the Advancement of Artificial Intelligence (AAAI)
Full-text link