Clinicians must participate in the development of multimodal AI
- PMID: 40496887
- PMCID: PMC12151691
- DOI: 10.1016/j.eclinm.2025.103252
Clinicians must participate in the development of multimodal AI
Abstract
Multimodal artificial intelligence (AI) is a powerful new technological advance, capable of simultaneously learning from diverse data types, such as text, images, video, and audio. Because clinical decisions are usually based on information from multiple sources, multimodal AI has the potential to significantly improve clinical practice. However, unlike most developed multimodal AI workflows, clinical medicine is both a dynamic and interventional process in which the clinician continually learns about the patient's health and acts accordingly as data is collected. In this article we argue that multimodal clinical AI must be fully attuned to the particular challenges and constraints of the clinic, and clinician involvement is needed throughout development-not just at clinical deployment. We propose ways that clinician involvement can add value at each stage of the multimodal AI development pipeline, and argue for the establishment of actively managed multidisciplinary communities to work collaboratively towards the shared goal of improving the health of all.
Keywords: Clinical AI; Community management; Health policy; Human-in-the-loop AI; Multimodal AI.
© 2025 The Authors.
Conflict of interest statement
CRSB, BDM, TC, and VH are supported by the Turing-Roche Strategic partnership. CRSB is additionally supported by Cancer Research UK. CH is an employee of Roche.
Figures
References
-
- Pichai S., Hassabis D. Introducing Gemini: Google's most capable AI model yet. 2023. https://blog.google/technology/ai/google-gemini-ai/#sundar-note
-
- Summaira J., Muhammad Shoib A., Bourahla O., Songyuan L., Abdul J. Recent advances and trends in multimodal deep learning: a review. 2021. https://arxiv.org/abs/2105.11087v1
-
- Pang L., Zhu S., Ngo C.W. Deep multimodal learning for affective analysis and retrieval. IEEE Trans Multimed. 2015;17:2008–2020.
-
- Zadeh A., Chen M., Poria S., Cambria E., Morency L.P. Tensor fusion network for multimodal sentiment analysis. 2017. http://arxiv.org/abs/1707.07250
-
- TCGA Research Network The cancer genome atlas. 2024. https://www.cancer.gov/tcga
Publication types
LinkOut - more resources
Full Text Sources
