Advances, Evaluation, and Explainability of Large Language Models in Healthcare: A Systematic Review
Amin, Syed Umar ; Guizani, Mohsen ; Hossain, M. Shamim
Amin, Syed Umar
Guizani, Mohsen
Hossain, M. Shamim
Supervisor
Department
Machine Learning
Embargo End Date
Type
Journal article
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Large Language Models (LLMs) have enormous potential in healthcare, but we also need to be careful and implement strong safety measures. These advanced computer systems can understand complex medical questions written in everyday language and often provide accurate medical information. This article explores how the field has evolved, from the first language models to the LLMs we have now, specifically designed or adjusted for use in the medical world. We explore how LLMs can make important language-related tasks in healthcare more efficient and effective, such as identifying medical terms, understanding how different pieces of information are connected, drawing conclusions from written text, sorting documents, and answering medical queries. We also shed light on new and exciting ways LLMs can use more than just text, like combining written information with images or other types of data. We compare some of the best recent LLMs and discuss how they can be used in everyday medical situations. We also measure how well these models perform in the biomedical field, explaining what each measurement tells us and where it falls short. Lastly, by discussing the biggest hurdles LLMs face in healthcare—things like generating incorrect information (hallucinations), unfair biases, protecting patient privacy, making sure the models are dependable, and fitting these new tools into how doctors and nurses already work. We also provide a roadmap for future research. We aim to present a clear and fair picture of what LLMs can offer healthcare right now and what's necessary to use them safely and dependably. This includes explaining how researchers are working to make these models easier for doctors and patients to understand. We followed the PRISMA research method to guide our review of the field, from early language models to current LLMs adapted for clinical use.
Citation
S. U. Amin, M. Guizani, and M. S. Hossain, “Advances, Evaluation, and Explainability of Large Language Models in Healthcare: A Systematic Review,” ACM Transactions on Multimedia Computing, Communications and Applications, Sep. 2025, doi: 10.1145/3786334.
Source
ACM Transactions on Multimedia Computing, Communications and Applications
Conference
Keywords
Subjects
Source
Publisher
Association for Computing Machinery
