Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Nov 1;5(11):1771-1779.
doi: 10.34067/KID.0000000000000556. Epub 2024 Aug 21.

Multimodal Artificial Intelligence in Medicine

Affiliations
Review

Multimodal Artificial Intelligence in Medicine

Conor S Judge et al. Kidney360. .

Abstract

Traditional medical artificial intelligence models that are approved for clinical use restrict themselves to single-modal data ( e.g ., images only), limiting their applicability in the complex, multimodal environment of medical diagnosis and treatment. Multimodal transformer models in health care can effectively process and interpret diverse data forms, such as text, images, and structured data. They have demonstrated impressive performance on standard benchmarks, like United States Medical Licensing Examination question banks, and continue to improve with scale. However, the adoption of these advanced artificial intelligence models is not without challenges. While multimodal deep learning models like transformers offer promising advancements in health care, their integration requires careful consideration of the accompanying ethical and environmental challenges.

PubMed Disclaimer

Conflict of interest statement

Disclosure forms, as provided by each author, are available with the online version of the article at http://links.lww.com/KN9/A646.

Figures

Figure 1
Figure 1
Patient scenarios without AI, with single modal AI and with multimodal AI. AI, artificial intelligence.
Figure 2
Figure 2
A future multimodal transformer-based AKI alert system. This figure depicts a multimodal transformer-based system designed to predict AKI risk by integrating text, structured data (e.g., laboratory tests), images (e.g., chest x-ray), and time series data. These are converted into numerical vectors through specific embedding layers. Positional encoding adds order information to these vectors, which are processed by the attention mechanism. The heatmap shows how the model focuses on relevant data parts, using keys, queries, and values to compute attention scores. Transformed vectors pass through a feed forward neural network for pattern learning. The architecture includes multiple identical decoder blocks for iterative refinement. At the top, the classification task predicts AKI risk at 24, 48, and 168 hours, triggering alerts if risk thresholds are exceeded. Cr, creatinine; Gl, glucose, Na, sodium; x H, attention heads; x K, decoder blocks.

References

    1. Rajpurkar P, Chen E, Banerjee O, Topol EJ. AI in health and medicine. Nat Med. 2022;28(1):31–38. doi: 10.1038/s41591-021-01614-0 - DOI - PubMed
    1. Sisson JC, Schoomaker EB, Ross JC. Clinical decision analysis: the hazard of using additional data. JAMA. 1976;236(11):1259–1263. doi: 10.1001/jama.236.11.1259 - DOI - PubMed
    1. Artificial Intelligence (AI) and Machine Learning (ML) in Medical Devices [Internet]. U.S. Food and Drug Administration; 2020. Accessed September 3, 2023. https://www.fda.gov/media/142998/download
    1. Meskó B, Görög M. A short guide for medical professionals in the era of artificial intelligence. NPJ Digit Med. 2020;3:126. doi: 10.1038/s41746-020-00333-z - DOI - PMC - PubMed
    1. Tomašev N Glorot X Rae JW, et al. A clinically applicable approach to continuous prediction of future acute kidney injury. Nature. 2019;572(7767):116–119. doi: 10.1038/s41586-019-1390-1 - DOI - PMC - PubMed

LinkOut - more resources