Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Sep 20;108(10):1341-1348.
doi: 10.1136/bjo-2024-325459.

Foundation models in ophthalmology

Affiliations
Review

Foundation models in ophthalmology

Mark A Chia et al. Br J Ophthalmol. .

Abstract

Foundation models represent a paradigm shift in artificial intelligence (AI), evolving from narrow models designed for specific tasks to versatile, generalisable models adaptable to a myriad of diverse applications. Ophthalmology as a specialty has the potential to act as an exemplar for other medical specialties, offering a blueprint for integrating foundation models broadly into clinical practice. This review hopes to serve as a roadmap for eyecare professionals seeking to better understand foundation models, while equipping readers with the tools to explore the use of foundation models in their own research and practice. We begin by outlining the key concepts and technological advances which have enabled the development of these models, providing an overview of novel training approaches and modern AI architectures. Next, we summarise existing literature on the topic of foundation models in ophthalmology, encompassing progress in vision foundation models, large language models and large multimodal models. Finally, we outline major challenges relating to privacy, bias and clinical validation, and propose key steps forward to maximise the benefit of this powerful technology.

Keywords: Imaging; Retina.

PubMed Disclaimer

Conflict of interest statement

Competing interests: PAK has acted as a consultant for DeepMind, Roche, Novartis, Apellis and BitFount and is an equity owner in Big Picture Medical. He has received speaker fees from Heidelberg Engineering, Topcon, Allergan and Bayer. AYL reports grants from Santen, personal fees from Genentech, personal fees from US FDA, personal fees from Johnson and Johnson, personal fees from Boehringer Ingelheim, non-financial support from iCareWorld, grants from Topcon, grants from Carl Zeiss Meditec, personal fees from Gyroscope, non-financial support from Optomed, non-financial support from Heidelberg, non-financial support from Microsoft, grants from Regeneron, grants from Amazon, grants from Meta, outside the submitted work; this article does not reflect the views of the US FDA.

Figures

Figure 1
Figure 1. Schematic diagram comparing foundation models with traditional artificial intelligence models, showing the benefits of generalisability, label efficiency and computational efficiency. Rather than training a new model for each task, a single foundation model is generalisable to multiple downstream tasks. By learning general representation from vast quantities of unlabelled data, foundation models require less labelled data for each task (size of green boxes). These fine-tuning stages are also computationally efficient compared with training models from scratch. FM, foundation model.
Figure 2
Figure 2. Pipeline for training vision foundation models using contrastive (A) and generative (B) self-supervised learning (SSL). In the contrastive SSL example, the pretext learning task involves applying random image augmentations and training a model to maximise the agreement of matching image pairs. In the generative SSL example, the pretext task involves masking areas of an image and training a model to reconstruct the missing portions. In both cases, the model learns general imaging features applicable to multiple downstream tasks.
Figure 3
Figure 3. Pipeline for training a large language model. Text is separated into a series of tokens (coloured highlighting). A proportion of these tokens are masked, and the model is trained to predict these missing tokens via a loss function. LLM, large language model.
Figure 4
Figure 4. Pipeline for training vision-language models. The image and text data are independently processed by encoders to generate feature embeddings representative of images and text. The vision-language models are trained to maximise the agreement between image and text feature embeddings. The trained encoders apply to both image-based and text-based downstream tasks. OCT, optical coherence tomography; DR, diabetic retinopathy.
Figure 5
Figure 5. Overview of the applications of foundation models in ophthalmology. The most useful models for clinicians and patients are likely to be large multimodal models. Applications can be divided broadly into three categories: medical education (A), workflow improvement (B) and clinical assistance (C). EHR, electronic health record; OSCE, objective structured clinical examination.

Similar articles

Cited by

References

    1. Ting DSW, Pasquale LR, Peng L, et al. Artificial intelligence and deep learning in ophthalmology. Br J Ophthalmol. 2019;103:167–75. doi: 10.1136/bjophthalmol-2018-313173. - DOI - PMC - PubMed
    1. Abràmoff MD, Lavin PT, Birch M, et al. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med . 2018;1:39. doi: 10.1038/s41746-018-0040-6. - DOI - PMC - PubMed
    1. Ipp E, Liljenquist D, Bode B, et al. Pivotal evaluation of an artificial intelligence system for autonomous detection of referrable and vision-threatening diabetic retinopathy. JAMA Netw Open. 2021;4:e2134254. doi: 10.1001/jamanetworkopen.2021.34254. - DOI - PMC - PubMed
    1. De Fauw J, Ledsam JR, Romera-Paredes B, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med. 2018;24:1342–50. doi: 10.1038/s41591-018-0107-6. - DOI - PubMed
    1. Poplin R, Varadarajan AV, Blumer K, et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat Biomed Eng. 2018;2:158–64. doi: 10.1038/s41551-018-0195-0. - DOI - PubMed

LinkOut - more resources