Am-sam: a spatially-aware prompt learning and mask calibration framework for few-shot semantic segmentation
Li, Yuchen ; Zhang, Li ; Liang, Youwei ; Xie, Pengtao
Li, Yuchen
Zhang, Li
Liang, Youwei
Xie, Pengtao
Supervisor
Department
Machine Learning
Embargo End Date
Type
Journal article
Date
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Segment Anything Model (SAM) has gained significant recognition in the field of semantic segmentation due to its versatile capabilities and impressive performance. However, SAM struggles in few-shot segmentation scenarios and faces two key limitations: (1) it relies heavily on meticulous human-provided prompts like key points, bounding boxes or text messages, which is labor-intensive; (2) the mask decoder’s feature representation is sometimes inaccurate, as it solely employs dot product operations, which inadequately captures the necessary correlations for precise segmentation. Current solutions to these problems such as fine-tuning SAM often require retraining a large number of parameters, which is time-consuming and needs sufficient training data. To address the limitations of SAM in few-shot scenarios, we propose AM-SAM: a spatially-aware prompt learning and mask calibration framework based on bi-level optimization. Specifically, our approach automatically generates prompts for an input image, which eliminates the need for human involvement in early training epochs, thus achieving faster convergence. Additionally, while freezing the main part of SAM, we enhance the mask decoder using Parameter-Efficient Fine-Tuning (PEFT) through Low-Rank Adaptation (LoRA) and introduce an element-wise multiplication operation that complements the standard dot product, thereby improving feature representation learning. This design better captures fine-grained, dimension-wise correlations between token features and image embeddings, resulting in more accurate mask calibration and improved segmentation quality. Our experimental results demonstrate that in a few-shot learning setting, AM-SAM exceeds the performance of human-generated or SAM’s default prompts. Notably, on the body segmentation dataset, our method yields a 7% higher dice score with only a 4-example training set compared to the SOTA method, underscoring its superiority in semantic segmentation tasks. Our code is available at: https://github.com/aierwaixi/AM-SAM.
Citation
Y. Li, L. Zhang, Y. Liang, P. Xie, "Am-sam: a spatially-aware prompt learning and mask calibration framework for few-shot semantic segmentation," International Journal of Machine Learning and Cybernetics, vol. 17, no. 1, pp. 35-35, 2026, https://doi.org/10.1007/s13042-025-02911-7.
Source
International Journal of Machine Learning and Cybernetics
Conference
Keywords
46 Information and Computing Sciences, 4605 Data Management and Data Science
Subjects
Source
Publisher
Springer Nature
