Pro-NeXt: An All-in-One Unified Model for General Fine-Grained Visual Recognition
Wu, Junde ; Zhu, Jiayuan ; Xu, Min ; Jin, Yueming
Wu, Junde
Zhu, Jiayuan
Xu, Min
Jin, Yueming
Author
Supervisor
Department
Computer Vision
Embargo End Date
Type
Journal article
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Unlike general visual classification (CLS) tasks, certain CLS problems are significantly more challenging as they involve recognizing professionally categorized or highly specialized images. Fine-Grained Visual Classification (FGVC) has emerged as a broad solution to address this complexity. However, most existing methods have been predominantly evaluated on a limited set of homogeneous benchmarks, such as bird species or vehicle brands. Moreover, these approaches often train separate models for each specific task, which restricts their generalizability. This paper proposes a scalable and explainable foundational model designed to tackle a wide range of FGVC tasks from a unified and generalizable perspective. We introduce a novel architecture named Pro-NeXt and reveal that Pro-NeXt exhibits substantial generalizability across diverse professional fields such as fashion, medicine, and art areas, previously considered disparate. Our basic-sized Pro-NeXt-B surpasses all preceding task-specific models across 12 distinct datasets within 5 diverse domains. Furthermore, we find its good scaling property that scaling up Pro-NeXt in depth and width with increasing GFlops can consistently enhance its accuracy. Beyond scalability and adaptability, the intermediate features of Pro-NeXt achieve reliable object detection and segmentation performance without extra training, highlighting its solid explainability. We will release the code to promote further research in this area.
Citation
J. Wu, J. Zhu, M. Xu and Y. Jin, "Pro-NeXt: An All-in-One Unified Model for General Fine-Grained Visual Recognition," in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2025.3584902
Source
IEEE Transactions on Pattern Analysis and Machine Intelligence
Conference
Keywords
Subjects
Source
Publisher
IEEE
