Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety
Ma, Xingjun ; Gao, Yifeng ; Wang, Yixu ; Wang, Ruofan ; Wang, Xin ; Sun, Ye ; Ding, Yifan ; Xu, Hengyuan ; Chen, Yunhao ; Zhao, Yunhao ... show 10 more
Ma, Xingjun
Gao, Yifeng
Wang, Yixu
Wang, Ruofan
Wang, Xin
Sun, Ye
Ding, Yifan
Xu, Hengyuan
Chen, Yunhao
Zhao, Yunhao
Author
Ma, Xingjun
Gao, Yifeng
Wang, Yixu
Wang, Ruofan
Wang, Xin
Sun, Ye
Ding, Yifan
Xu, Hengyuan
Chen, Yunhao
Zhao, Yunhao
Huang, Hanxun
Li, Yige
Wu, Yutao
Zhang, Jiaming
Zheng, Xiang
Bai, Yang
Li, Yiming
Wu, Zuxuan
Qiu, Xipeng
Zhang, Jingfeng
Han, Xudong
Li, Haonan
Sun, Jun
Wang, Cong
Gu, Jindong
Wu, Baoyuan
Chen, Siheng
Zhang, Tianwei
Liu, Yang
Gong, Mingming
Liu, Tongliang
Pan, Shirui
Xie, Cihang
Pang, Tianyu
Dong, Yinpeng
Jia, Ruoxi
Zhang, Yang
Ma, Shiqing
Zhang, Xiangyu
Gong, Neil
Xiao, Chaowei
Erfani, Sarah
Baldwin, Tim
Li, Bo
Sugiyama, Masashi
Tao, Dacheng
Bailey, James
Jiang, Yu-Gang
Gao, Yifeng
Wang, Yixu
Wang, Ruofan
Wang, Xin
Sun, Ye
Ding, Yifan
Xu, Hengyuan
Chen, Yunhao
Zhao, Yunhao
Huang, Hanxun
Li, Yige
Wu, Yutao
Zhang, Jiaming
Zheng, Xiang
Bai, Yang
Li, Yiming
Wu, Zuxuan
Qiu, Xipeng
Zhang, Jingfeng
Han, Xudong
Li, Haonan
Sun, Jun
Wang, Cong
Gu, Jindong
Wu, Baoyuan
Chen, Siheng
Zhang, Tianwei
Liu, Yang
Gong, Mingming
Liu, Tongliang
Pan, Shirui
Xie, Cihang
Pang, Tianyu
Dong, Yinpeng
Jia, Ruoxi
Zhang, Yang
Ma, Shiqing
Zhang, Xiangyu
Gong, Neil
Xiao, Chaowei
Erfani, Sarah
Baldwin, Tim
Li, Bo
Sugiyama, Masashi
Tao, Dacheng
Bailey, James
Jiang, Yu-Gang
Supervisor
Department
Machine Learning
Embargo End Date
Type
Journal article
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape of Artificial Intelligence (AI). These models are now foundational to a wide range of applications, including conversational AI, recommendation systems, autonomous driving, content generation, medical diagnostics, and scientific discovery. However, their widespread deployment also exposes them to significant safety risks, raising concerns about robustness, reliability, and ethical implications. This survey provides a systematic review of current safety research on large models, covering Vision Foundation Models (VFMs), Large Language Models (LLMs), Vision-Language Pre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models (DMs), and large-model-powered Agents. Our contributions are summarized as follows: (1) We present a comprehensive taxonomy of safety threats to these models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats. (2) We review defense strategies proposed for each type of attack, if available and summarize the commonly used datasets and benchmarks for safety research. (3) Building on this, we identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices. More importantly, we highlight the necessity of collective efforts from the research community and international collaboration. Our work can serve as a useful reference for researchers and practitioners, fostering the ongoing development of comprehensive defense systems and platforms to safeguard AI models. GitHub: https://github.com/xingjunm/Awesome-Large-Model-Safety.
Citation
Xingjun Ma et al., “Safety at scale: a comprehensive survey of large model and agent safety,” Foundations and Trends in Privacy and Security, vol. 8, no. 3–4, pp. 1–240, Jan. 2026, doi: 10.1561/3300000051
Source
Foundations and Trends® in Privacy and Security
Conference
Keywords
Large model safety, Agent safety, AI safety
Subjects
Source
Publisher
Emerald
