Bridge the Intra-Class Gap: K-shot Multi-scale Intermediate Prototype Mining Transformer for Few-Shot Semantic Segmentation
Liu, Yuanwei ; Liu, Nian ; Jiang, Tao ; Yao, Xiwen ; Anwer, Rao Muhammad ; Cholakkal, Hisham ; Han, Junwei
Liu, Yuanwei
Liu, Nian
Jiang, Tao
Yao, Xiwen
Anwer, Rao Muhammad
Cholakkal, Hisham
Han, Junwei
Supervisor
Department
Computer Vision
Embargo End Date
Type
Journal article
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Few-shot segmentation (FSS) aims to accurately segment target objects in a query image using only a limited number of annotated support images. Existing approaches typically follow a paradigm that directly leverages category information from the support set to identify target objects in the query. However, these methods often ignore the category information gap between query and support images, leading to suboptimal performance when faced with images containing objects exhibiting significant intra-class diversity. To address this issue, we propose a novel framework that introduces intermediate prototypes to capture both deterministic information from the support images and adaptive knowledge from the query at multiple scales. Our framework, named the K-shot Multi-scale Intermediate Prototype Mining Transformer (KMIPMT), is based on the Transformer architecture and learns intermediate prototypes in an iterative manner, where each KMIPMT layer propagates category information from both K-shot support features and multi-scale query features to intermediate prototypes. This information is then utilized to activate the query feature map. Through repeated iterations, both intermediate prototypes and the query feature are progressively enhanced, and the final refined query feature is used for generating precise segmentation predictions. Despite its simplicity, our method achieves remarkable performance gains on standard benchmarks, including PASCAL-5i, COCO-20i, and FSS-1000, setting new state-of-the-art results. Furthermore, we explore several practical and challenging extensions of our method, including 3D point cloud FSS, zero-shot segmentation, weak-label FSS, and cross-domain FSS. These extensions showcase the versatility and effectiveness of our proposed KMIPMT framework across different domains and scenarios. Code is available at https://github.com/LIUYUANWEI98/KMIPMT
Citation
Y. Liu et al., "Bridge the Intra-Class Gap: K-shot Multi-scale Intermediate Prototype Mining Transformer for Few-Shot Semantic Segmentation," in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2025.3593816
Source
IEEE Transactions on Pattern Analysis and Machine Intelligence
Conference
Keywords
Few-shot, Semantic Segmentation, Intermediate Prototype
Subjects
Source
Publisher
IEEE
