Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 May 23:84:103252.
doi: 10.1016/j.eclinm.2025.103252. eCollection 2025 Jun.

Clinicians must participate in the development of multimodal AI

Affiliations
Review

Clinicians must participate in the development of multimodal AI

Christopher R S Banerji et al. EClinicalMedicine. .

Abstract

Multimodal artificial intelligence (AI) is a powerful new technological advance, capable of simultaneously learning from diverse data types, such as text, images, video, and audio. Because clinical decisions are usually based on information from multiple sources, multimodal AI has the potential to significantly improve clinical practice. However, unlike most developed multimodal AI workflows, clinical medicine is both a dynamic and interventional process in which the clinician continually learns about the patient's health and acts accordingly as data is collected. In this article we argue that multimodal clinical AI must be fully attuned to the particular challenges and constraints of the clinic, and clinician involvement is needed throughout development-not just at clinical deployment. We propose ways that clinician involvement can add value at each stage of the multimodal AI development pipeline, and argue for the establishment of actively managed multidisciplinary communities to work collaboratively towards the shared goal of improving the health of all.

Keywords: Clinical AI; Community management; Health policy; Human-in-the-loop AI; Multimodal AI.

PubMed Disclaimer

Conflict of interest statement

CRSB, BDM, TC, and VH are supported by the Turing-Roche Strategic partnership. CRSB is additionally supported by Cancer Research UK. CH is an employee of Roche.

Figures

Fig. 1
Fig. 1
Clinical data collection is a hierarchical, patient-personalised process. A. Clinical data acquisition differs for two patients with the same presentation, highlighting the importance of clinical context to data acquired. B. Patients about whom significant multimodal clinical data is collected are likely to have significant underlying pathologies.
Fig. 2
Fig. 2
Multimodal data fusion: AI and clinical perspectives. A comparison of AI perspectives and clinical interpretation is provided for the three most common multimodal data fusion approaches: A. Early fusion, B. Late fusion, C. Intermediate fusion.
Fig. 3
Fig. 3
Clinicians in the loop of multimodal AI development. Barriers to translation of multimodal AI can arise at any point in model development and clinician involvement is required throughout this loop to ensure that the AI tool is matched to its clinical context.

References

    1. Pichai S., Hassabis D. Introducing Gemini: Google's most capable AI model yet. 2023. https://blog.google/technology/ai/google-gemini-ai/#sundar-note
    1. Summaira J., Muhammad Shoib A., Bourahla O., Songyuan L., Abdul J. Recent advances and trends in multimodal deep learning: a review. 2021. https://arxiv.org/abs/2105.11087v1
    1. Pang L., Zhu S., Ngo C.W. Deep multimodal learning for affective analysis and retrieval. IEEE Trans Multimed. 2015;17:2008–2020.
    1. Zadeh A., Chen M., Poria S., Cambria E., Morency L.P. Tensor fusion network for multimodal sentiment analysis. 2017. http://arxiv.org/abs/1707.07250
    1. TCGA Research Network The cancer genome atlas. 2024. https://www.cancer.gov/tcga

LinkOut - more resources