Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 Jan 1;36(1):90-98.
doi: 10.1097/ICU.0000000000001091. Epub 2024 Nov 4.

Foundation models in ophthalmology: opportunities and challenges

Affiliations
Review

Foundation models in ophthalmology: opportunities and challenges

Mertcan Sevgi et al. Curr Opin Ophthalmol. .

Abstract

Purpose of review: Last year marked the development of the first foundation model in ophthalmology, RETFound, setting the stage for generalizable medical artificial intelligence (GMAI) that can adapt to novel tasks. Additionally, rapid advancements in large language model (LLM) technology, including models such as GPT-4 and Gemini, have been tailored for medical specialization and evaluated on clinical scenarios with promising results. This review explores the opportunities and challenges for further advancements in these technologies.

Recent findings: RETFound outperforms traditional deep learning models in specific tasks, even when only fine-tuned on small datasets. Additionally, LMMs like Med-Gemini and Medprompt GPT-4 perform better than out-of-the-box models for ophthalmology tasks. However, there is still a significant deficiency in ophthalmology-specific multimodal models. This gap is primarily due to the substantial computational resources required to train these models and the limitations of high-quality ophthalmology datasets.

Summary: Overall, foundation models in ophthalmology present promising opportunities but face challenges, particularly the need for high-quality, standardized datasets for training and specialization. Although development has primarily focused on large language and vision models, the greatest opportunities lie in advancing large multimodal models, which can more closely mimic the capabilities of clinicians.

PubMed Disclaimer

Conflict of interest statement

There are no conflicts of interest.

Figures

Box 1
Box 1
no caption available
FIGURE 1
FIGURE 1
Schematic representation of training traditional deep learning models and foundation models. The differences between training traditional deep learning (DL) models and foundation models (FM) are highlighted. Traditional DL models typically require labelled datasets and are trained for specific tasks. In contrast, foundation models are usually trained once on unlabelled data and subsequently fine-tuned for a variety of tasks and modalities, such as segmentation, classification, and object detection. CFP, colour fundus photo; DN, diabetic retinopathy; MA, microaneurysm; OCT, optical coherence tomography; UWF, ultra-wide field. Adapted from [4], licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) Licence.
FIGURE 2
FIGURE 2
Methods of specializing large language models. The various techniques used to tailor LLMs for specific applications, including fine-tuning, prompt engineering, and retrieval-augmented generation (RAG) are illustrated. Fine-tuning involves adjusting the internal model parameters to improve performance on a specific task, while prompt engineering and RAG do not alter the model parameters but instead enhance the model's output through different approaches.
FIGURE 3
FIGURE 3
Visual question and answer example scenario involving an ophthalmologist using a large multimodal language model for treating wet age-related macular degeneration. The LMM interprets OCT (optical coherence tomography) images of a patient with wet age-related macular degeneration, offering guidance on treatment adjustment. The model also responds to follow-up questions. Images are from [43] licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) Licence. LMM, large multimodal models.

Similar articles

Cited by

References

    1. De Fauw J, Ledsam JR, Romera-Paredes B, et al. . Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med 2018; 24:1342–1350. - PubMed
    1. Chia MA, Antaki F, Zhou Y, et al. . Foundation models in ophthalmology. Br J Ophthalmol 2024; 108:1341–1348. doi: 10.1136/bjo-2024-325459. [Epub ahead of print]. - PMC - PubMed
    1. Bommasani R, Hudson DA, Adeli E, et al. On the opportunities and risks of foundation models. arXiv [cs.LG]. 2021. Available at: http://arxiv.org/abs/2108.07258. [Accessed 3 June 2024]
    1. Ross A, McGrow K, Zhi D, et al. . Foundation models, generative AI, and large language models: essentials for nursing. Comput Inform Nurs 2024; 42:377–387. - PMC - PubMed
    1. Brown TB, Mann B, Ryder N, et al. Language models are few-shot learners. arXiv [cs.CL]. 2020. Available at: https://arxiv.org/abs/2005.14165. [Accessed 3 June 2024]