Loading...
Thumbnail Image
Item

Trusteval: A dynamic evaluation toolkit on trustworthiness of generative foundation models

Wang, Yanbo
Ye, Jiayi
Wu, Siyuan
Gao, Chujie
Huang, Yue
Chen, Xiuying
Zhao, Yue
Zhang, Xiangliang
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Ensuring the trustworthiness of Generative Foundation Models (GenFMs) is a pressing challenge as they gain widespread use. Existing evaluation toolkits are often limited in scope, dynamism, and flexibility. This paper introduces TRUSTEVAL, a dynamic and comprehensive toolkit designed for evaluating GenFMs across various dimensions. TRUSTEVAL supports both dynamic dataset generation and evaluation, offering advanced features including comprehensiveness, usability, and flexibility. TRUSTEVAL integrates diverse generative models, datasets, evaluation methods, metrics, inference efficiency enhancement, and evaluation report generation. Through case studies, we demonstrate TRUSTEVAL’s potential to advance the trustworthiness evaluation of GenFMs.
Citation
Y. Wang et al., “TRUSTEVAL: A Dynamic Evaluation Toolkit on Trustworthiness of Generative Foundation Models,” 2025. Accessed: Jun. 03, 2025. [Online]. Available: https://aclanthology.org/2025.naacl-demo.8/
Source
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)
Conference
Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)
Keywords
Subjects
Source
Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)
Publisher
Association for Computational Linguistics
DOI
Full-text link