Item

Grade Like a Human: Rethinking Automated Assessment with Large Language Models

Xie, Wenjing
Niu, Juxin
Xue, Chun Jason
Guan, Nan
Supervisor
Department
Computer Science
Embargo End Date
Type
Conference proceeding
Date
License
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Grading is a foundational component of assessment in higher education, aiming to evaluate student work in a reliable, repeatable, and interpretable manner. Short-answer questions effectively assess understanding, analysis, and articulation, but their open-ended nature makes traditional workflows reliant on detailed rubrics and manual review, resulting in substantial time and labor. Although recent work has explored using large language models (LLMs) for automated short-answer grading (ASAG), significant gaps remain in rubric design and in ensuring scoring consistency and fairness. Inspired by best practices in human grading, we propose Grade-Like-a-Human, a systematic multi-agent framework that spans the full pipeline: iteratively aligning rubrics with real answers, leveraging cross-item memory to enhance scoring consistency, and integrating a post-grading audit-and-feedback loop. We evaluate our method on an open-source short-answer grading benchmark and deploy it in a real undergraduate Operating Systems course, using authentic questions and student submissions for evaluation. We further release the questions, student submissions, and grading artifacts as the OS dataset1. Experiments demonstrate substantial improvements in accuracy, consistency, and fairness.
Citation
W. Xie, J. Niu, C.J. Xue, N. Guan, "Grade Like a Human: Rethinking Automated Assessment with Large Language Models," 2026, pp. 1-8.
Source
Conference
Proceedings of the International Conference on Research in Adaptive and Convergent Systems
Keywords
46 Information and Computing Sciences, 4608 Human-Centred Computing
Subjects
Source
Proceedings of the International Conference on Research in Adaptive and Convergent Systems
Publisher
Association for Computing Machinery
Full-text link