Loading...
Thumbnail Image
Item

Human–large language model collaboration in clinical medicine: a systematic review and meta-analysis

Wang, Guoyong
Zhang, Kaijun
Jiang, Jiyue
Wang, Chaonan
Bi, Hui
Liang, Haojun
Qi, Zuoliang
Huang, Ying
Li, Yu
Yang, Xiaonan
Supervisor
Department
Computational Biology
Embargo End Date
Type
Journal article
Date
License
http://creativecommons.org/licenses/by/4.0/
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Human–AI collaboration (H + AI) using large language models (LLMs) offers a promising approach to enhance clinical reasoning, documentation, and interpretation tasks. Following PRISMA 2020 (PROSPERO registration: CRD420251068272), we systematically compared H + AI with human-only (H) workflows, searching four databases through June 28, 2025. Ten peer-reviewed studies met eligibility criteria, with three preprints informing sensitivity analyses only. Diagnostic/interpretation accuracy (k = 2) showed a positive trend for H + AI (Risk Ratio [RR] 1.59), but was statistically imprecise and non-significant (95% CI 0.08 to 32.74), with 95% prediction intervals (PI) crossing the null. Composite diagnostic/management scores (k = 2) showed a statistically significant improvement (Mean Difference [MD] +4.88 percentage points, 95% CI + 0.65 to +9.12), yet the PI (–31.65 to 41.42) indicates high real-world uncertainty. Time efficiency (k = 3) showed no overall difference (MD + 0.4 min, 95%CI −4.18 to +4.97; I² = 70.1%). While documentation quality improved, but factual error rates remained high (~26–36%), undermining quality gains. In three-arm settings, H + AI did not universally outperform AI-only. Evidence remains preliminary yet highly uncertain and context-dependent. We recommend preregistered, pragmatic, multicenter trials embedded in real workflows, with harmonized core outcomes that prioritize safety/error metrics and interfaces that surface uncertainty and support verification.
Citation
G. Wang, K. Zhang, J. Jiang, C. Wang, H. Bi, H. Liang , et al., "Human–large language model collaboration in clinical medicine: a systematic review and meta-analysis," npj Digital Medicine, vol. 9, no. 1, pp. 195-195, 2026, https://doi.org/10.1038/s41746-026-02382-2.
Source
npj Digital Medicine
Conference
Keywords
42 Health Sciences, 4203 Health Services and Systems, 3 Good Health and Well Being
Subjects
Source
Publisher
Springer Nature
Full-text link