Item

TX-LLaVA: Large Language and Vision Assistant for Temporal Changes in Chest X-Rays

Elgendy, Hosam
Cholakkal, Hisham
Supervisor
Department
Computer Vision
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Current large multimodal models (LMMs) leverage chest X-ray (CXR) images to create informative reports; however, these models typically limit input to single images, missing time-relative insights. This work introduces TX-LLaVA (Temporal X-ray Large Language Vision Assistant), designed to track historical changes and produce holistic reports across multiple CXR images taken over different visits. Built upon Video-LLaVA, TX-LLaVA incorporates a unique temporal dataset and utilizes efficient fine-tuning techniques to achieve state-of-the-art results. TX-LLaVA not only generates detailed reports but also effectively highlights changes across sequential CXR scans, enhancing the diagnostic process. TX-LLaVA reaches a ROUGE-L score of 0.20, a 21.21% increase from the baseline model.
Citation
H. Elgendy and H. Cholakkal, "TX-LLaVA: Large Language and Vision Assistant for Temporal Changes in Chest X-Rays," 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI), Houston, TX, USA, 2025, pp. 1-4, doi: 10.1109/ISBI60581.2025.10980793
Source
2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI), 1-4, 2025
Conference
IEEE International Symposium on Biomedical Imaging, 2025
Keywords
Healthcare, Large multi-modal model, X-ray radiography, Computer-Aided Diagnosis
Subjects
Source
IEEE International Symposium on Biomedical Imaging, 2025
Publisher
IEEE
Full-text link