Open-Set Semi-Supervised Learning for Long-Tailed Medical Datasets
Kareem, Daniya Najiha A. ; Lahoud, Jean ; Fiaz, Mustansar ; Kumar, Amandeep ; Cholakkal, Hisham
Kareem, Daniya Najiha A.
Lahoud, Jean
Fiaz, Mustansar
Kumar, Amandeep
Cholakkal, Hisham
Supervisor
Department
Computer Vision
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Many practical medical imaging scenarios include categories that are under-represented but still crucial. The relevance of image recognition models to real-world applications lies in their ability to generalize to these rare classes as well as unseen classes. Real-world generalization requires taking into account the various complexities that can be encountered in the real-world. First, training data is highly imbalanced, which may lead to model exhibiting bias toward the more frequently represented classes. Moreover, real-world data may contain unseen classes that need to be identified, and model performance is affected by the data scarcity. While medical image recognition has been extensively addressed in the literature, current methods do not take into account all the intricacies in the real-world scenarios. To this end, we propose an open-set learning method for highly imbal-anced medical datasets using a semi-supervised approach. Understanding the adverse impact of long-tail distribution at the inherent model characteristics, we implement a reg-ularization strategy at the feature level complemented by a classifier normalization technique. We conduct extensive experiments on the publicly available datasets, ISIC20 18, ISIC2019, and TissueMNIST with various numbers of labelled samples. Our analysis shows that addressing the impact of long-tail data in classification significantly improves the overall performance of the network in terms of closed-set and open-set accuracies on all datasets. Our code and trained models will be made publicly available at https://github.com/Daniyanaj/OpenLTR.
Citation
D. N. A. Kareem, J. Lahoud, M. Fiaz, A. Kumar and H. Cholakkal, "Open-Set Semi-Supervised Learning for Long-Tailed Medical Datasets," 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI), Houston, TX, USA, 2025, pp. 1-5, doi: 10.1109/ISBI60581.2025.10981231.
Source
2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI)
Conference
IEEE International Symposium on Biomedical Imaging, 2025
Keywords
Ethics, Heavily-tailed distribution, Image recognition, Open Access, Conferences, Training data, Skin, Data models, Standards, Biomedical imaging
Subjects
Source
IEEE International Symposium on Biomedical Imaging, 2025
Publisher
IEEE
