X-SAM: From Segment Anything to Any Segmentation
Wang, Hao ; Qiao, Limeng ; Jie, Zequn ; Huang, Zhijian ; Feng, Chengjian ; Zheng, Qingfang ; Ma, Lin ; Lan, Xiangyuan ; Liang, Xiaodan
Wang, Hao
Qiao, Limeng
Jie, Zequn
Huang, Zhijian
Feng, Chengjian
Zheng, Qingfang
Ma, Lin
Lan, Xiangyuan
Liang, Xiaodan
Supervisor
Department
Computer Vision
Embargo End Date
Type
Conference proceeding
Date
License
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Large Language Models (LLMs) demonstrate strong capabilities in broad knowledge representation, yet they are inherently deficient in pixel-level perceptual understanding. Although the Segment Anything Model (SAM) represents a significant advancement in visual-prompt-driven image segmentation, it exhibits notable limitations in multi-mask prediction and category-specific segmentation tasks, and it cannot integrate all segmentation tasks within a unified model architecture. To address these limitations, we present X-SAM, a streamlined Multimodal Large Language Model (MLLM) framework that extends the segmentation paradigm from segment anything to any segmentation. Specifically, we introduce a novel unified framework that enables more advanced pixel-level perceptual comprehension for MLLMs. Furthermore, we propose a new segmentation task, termed Visual GrounDed (VGD) segmentation, which segments all instance objects with interactive visual prompts and empowers MLLMs with visual grounded, pixel-wise interpretative capabilities. To enable effective training on diverse data sources, we present a unified training strategy that supports co-training across multiple datasets. Experimental results demonstrate that X-SAM achieves state-of-the-art performance on a wide range of image segmentation benchmarks, highlighting its efficiency for multimodal, pixel-level visual understanding.
Citation
H. Wang, L. Qiao, Z. Jie, Z. Huang, C. Feng, Q. Zheng , et al., "X-SAM: From Segment Anything to Any Segmentation," 2026, pp. 26187-26196.
Source
Proceedings of the AAAI Conference on Artificial Intelligence
Conference
The Fortieth AAAI Conference on Artificial Intelligence
Keywords
46 Information and Computing Sciences, 4603 Computer Vision and Multimedia Computation, 4607 Graphics, Augmented Reality and Games, 4611 Machine Learning
Subjects
Source
The Fortieth AAAI Conference on Artificial Intelligence
Publisher
Association for the Advancement of Artificial Intelligence
