Item

TX-LLaVA: Large Language and Vision Assistant for Temporal Changes in Chest X-Rays

Elgendy, Hosam
Cholakkal, Hisham
Citations
Google Scholar:
Altmetric:
Supervisor
Department
Computer Vision
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Current large multimodal models (LMMs) leverage chest X-ray (CXR) images to create informative reports; however, these models typically limit input to single images, missing time-relative insights. This work introduces TX-LLaVA (Temporal X-ray Large Language Vision Assistant), designed to track historical changes and produce holistic reports across multiple CXR images taken over different visits. Built upon Video-LLaVA, TX-LLaVA incorporates a unique temporal dataset and utilizes efficient fine-tuning techniques to achieve state-of-the-art results. TX-LLaVA not only generates detailed reports but also effectively highlights changes across sequential CXR scans, enhancing the diagnostic process. TX-LLaVA reaches a ROUGE-L score of 0.20, a 21.21% increase from the baseline model.
Citation
H. Elgendy and H. Cholakkal, "TX-LLaVA: Large Language and Vision Assistant for Temporal Changes in Chest X-Rays," 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI), Houston, TX, USA, 2025, pp. 1-4, doi: 10.1109/ISBI60581.2025.10980793
Source
2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI), 1-4, 2025
Conference
IEEE International Symposium on Biomedical Imaging, 2025
Keywords
Healthcare, Large multi-modal model, X-ray radiography, Computer-Aided Diagnosis
Subjects
Source
IEEE International Symposium on Biomedical Imaging, 2025
Publisher
IEEE
Full-text link