New Paradigm for Evaluating Scholar Summaries: A Facet-aware Metric and A Meta-evaluation Benchmark
Wang, Tairan ; Chen, Xiuying ; Zhu, Qingqing ; Guo, Taicheng ; Gao, Shen ; Lu, Zhiyong ; Gao, Xin ; Zhang, Xiangliang
Wang, Tairan
Chen, Xiuying
Zhu, Qingqing
Guo, Taicheng
Gao, Shen
Lu, Zhiyong
Gao, Xin
Zhang, Xiangliang
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Journal article
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Evaluation of summary quality is particularly crucial within the scientific domain, because it facilitates efficient knowledge dissemination and automated scientific information retrieval. This paper presents conceptual and experimental analyses of scientific summarization, highlighting the inadequacies of traditional evaluation methods. These methods, including 𝑛-gram overlap calculations, embedding comparisons, verification, and QA-based approaches, often fall short in providing explanations, grasping scientific concepts, or identifying key content. Correspondingly, we introduce the Facet-aware Metric (FM), employing LLMs for advanced semantic matching to evaluate summaries based on different facets. The facet granularity is tailored to the structure of scientific abstracts, offering an integrated evaluation approach that is not fragmented, while also providing fine-grained interpretability. Recognizing the absence of an evaluation benchmark in the scientific domain, we curate a Scientific abstract summary evaluation Dataset (ScholarSum) with facet-level annotations. Our findings confirm that FM offers a more logical approach to evaluating scientific summaries. In addition, fine-tuned smaller models can compete with LLMs in scientific contexts, while LLMs have limitations in learning from in-context information in scientific domains. We hope our benchmark inspires better evaluation metrics and future enhancements to LLMs: https://github.com/iriscxy/ScholarSum.
Citation
WangTairan et al., “New Paradigm for Evaluating Scholar Summaries: A Facet-aware Metric and A Meta-evaluation Benchmark,” ACM Trans Inf Syst, Sep. 2024, doi: 10.1145/3733597.
Source
ACM Transactions on Information Systems
Conference
Keywords
Subjects
Source
Publisher
Association for Computing Machinery
