Item

A Unified Agentic Framework for Automated Fact-Checking and Factual Evaluation of Large Language Models

Iqbal, Hasan
Department
Natural Language Processing
Embargo End Date
30/05/2025
Type
Thesis
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
The growing use of large language models (LLMs) in real-world applications highlights the need for automatic tools to check the factual accuracy of their outputs, as LLMs often hallucinate. However, varying evaluation benchmarks make comparisons difficult. To address this, we present OpenFactCheck, a unified fact-checking framework with three core modules: (i) Response Eval, for customizing and assessing claims within documents; (ii) LLMEval, for evaluating overall LLM factuality using the FactQA benchmark, which combines seven existing factuality datasets; and (iii) Checker Eval, for benchmarking automatic fact-checkers using Factcheck-Bench, a new human-annotated dataset with over 1,000 claims. Open Fact Check is open-source, available as a Python library and web service, with a demo video at =HYPERLINK("https://youtu.be/-i9VKL0HleI", "https://youtu.be/-i9VKL0HleI").
Citation
Hasan Iqbal, “A Unified Agentic Framework for Automated Fact-Checking and Factual Evaluation of Large Language Models,” Master of Science thesis, Natural Language Processing, MBZUAI, 2025.
Source
Conference
Keywords
Natural Language Processing, Fact-checking, Large Language Model (LLM), Factuality, Evaluation
Subjects
Source
Publisher
DOI
Full-text link