Libra-leaderboard: Towards responsible ai through a balanced leaderboard of safety and capability
Li, Haonan ; Han, Xudong ; Zhai, Zenan ; Mu, Honglin ; Wang, Hao ; Zhang, Zhenxuan ; Geng, Yilin ; Lin, Shom ; Wang, Renxi ; Shelmanov, Artem ... show 10 more
Li, Haonan
Han, Xudong
Zhai, Zenan
Mu, Honglin
Wang, Hao
Zhang, Zhenxuan
Geng, Yilin
Lin, Shom
Wang, Renxi
Shelmanov, Artem
Author
Li, Haonan
Han, Xudong
Zhai, Zenan
Mu, Honglin
Wang, Hao
Zhang, Zhenxuan
Geng, Yilin
Lin, Shom
Wang, Renxi
Shelmanov, Artem
Qi, Xiangyu
Wang, Yuxia
Hong, Donghai
Yuan, Youliang
Chen, Meng
Tu, Haoqin
Koto, Fajri
Zeng, Cong
Kuribayashi, Tatsuki
Bhardwaj, Rishabh
Zhao, Bingchen
Duan, Yawen
Liu, Yi
Alghamdi, Emad A.
Yang, Yaodong
Dong, Yinpeng
Poria, Soujanya
Liu, Pengfei
Liu, Zhengzhong
Ren, Hector Xuguang
Hovy, Eduard
Gurevych, Iryna
Nakov, Preslav
Choudhury, Monojit
Baldwin, Timothy
Han, Xudong
Zhai, Zenan
Mu, Honglin
Wang, Hao
Zhang, Zhenxuan
Geng, Yilin
Lin, Shom
Wang, Renxi
Shelmanov, Artem
Qi, Xiangyu
Wang, Yuxia
Hong, Donghai
Yuan, Youliang
Chen, Meng
Tu, Haoqin
Koto, Fajri
Zeng, Cong
Kuribayashi, Tatsuki
Bhardwaj, Rishabh
Zhao, Bingchen
Duan, Yawen
Liu, Yi
Alghamdi, Emad A.
Yang, Yaodong
Dong, Yinpeng
Poria, Soujanya
Liu, Pengfei
Liu, Zhengzhong
Ren, Hector Xuguang
Hovy, Eduard
Gurevych, Iryna
Nakov, Preslav
Choudhury, Monojit
Baldwin, Timothy
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
As large language models (LLMs) continue to evolve, leaderboards play a significant role in steering their development. Existing leaderboards often prioritize model capabilities while overlooking safety concerns, leaving a significant gap in responsible AI development. To address this gap, we introduce Libra-Leaderboard, a comprehensive framework designed to rank LLMs through a balanced evaluation of performance and safety. Combining a dynamic leaderboard with an interactive LLM arena, Libra-Leaderboard encourages the joint optimization of capability and safety. Unlike traditional approaches that average performance and safety metrics, Libra-Leaderboard uses a distance-to-optimal-score method to calculate the overall rankings. This approach incentivizes models to achieve a balance rather than excelling in one dimension at the expense of some other ones. In the first release, Libra-Leaderboard evaluates 26 mainstream LLMs from 14 leading organizations, identifying critical safety challenges even in state-of-the-art models.
Citation
H. Li et al., “Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability,” pp. 268–286, Jun. 2025, doi: 10.18653/V1/2025.NAACL-DEMO.23
Source
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)
Conference
2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)
Keywords
Balanced Evaluation, Large Language Models, Safety-Capability Trade-off, Leaderboard Design, Adversarial Prompt Attacks, Interactive Safety Arena, Unified Scoring Metric, Responsible AI Benchmarking
Subjects
Source
2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)
Publisher
Association for Computational Linguistics
