Item

Optimising Vision Transformer Performance on Limited Datasets: A Multi-Gradient Approach

Ali, Mohsin
Raza, Haider
Gan, John Q.
Khan, Muhammad Haris
Supervisor
Department
Computer Vision
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Vision Transformers (ViTs) are well-known for capturing the global context of images using Multi-head Self-Attention (MHSA). However, compared to Convolutional Neural Networks (CNNs), ViTs typically exhibit a reduced inductive bias and require a larger volume of training image data to learn local feature representations. While various methods like the integration of CNN features or advanced pre-training strategies have been proposed to introduce this inductive bias, they often require significant architectural modifications or rely heavily on expansive pre-training datasets. This paper introduces a novel approach for training ViTs on limited datasets without altering the ViT architecture. We propose the Multi-Gradient Image Transformer (MGiT), which utilizes a parallel training method with a compact auxiliary ViT to adaptively optimize the weights of the target ViT. This approach yields significant performance improvements across diverse datasets and training scenarios. Our findings demonstrate that MGiT enhances ViT efficiency more effectively than traditional training methods. Furthermore, the application of Jensen-Shannon (JS) Divergence validates the convergence and alignment of feature understanding between the primary and auxiliary ViTs, thereby stabilizing the training process. The code is available at https://github.com/game-sys/Multi-Gradient-Image-Transformer-MGiT-
Co-author(s)
Citation
M. Ali, H. Raza, J. Q. Gan and M. Haris, "Optimising Vision Transformer Performance on Limited Datasets: A Multi-Gradient Approach," in 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA, 2025, pp. 693-702, doi: 10.1109/CVPRW67362.2025.00074.
Source
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
Conference
2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2025
Keywords
Subjects
Source
2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2025
Publisher
IEEE
Full-text link