Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 22;4(1):6.
doi: 10.1038/s44172-025-00341-5.

Distributed training of foundation models for ophthalmic diagnosis

Affiliations

Distributed training of foundation models for ophthalmic diagnosis

Sina Gholami et al. Commun Eng. .

Abstract

Vision impairment affects nearly 2.2 billion people globally, and nearly half of these cases could be prevented with early diagnosis and intervention-underscoring the urgent need for reliable and scalable detection methods for conditions like diabetic retinopathy and age-related macular degeneration. Here we propose a distributed deep learning framework that integrates self-supervised and domain-adaptive federated learning to enhance the detection of eye diseases from optical coherence tomography images. We employed a self-supervised, mask-based pre-training strategy to develop a robust foundation encoder. This encoder was trained on seven optical coherence tomography datasets, and we compared its performance under local, centralized, and federated learning settings. Our results show that self-supervised methods-both centralized and federated-improved the area under the curve by at least 10% compared to local models. Additionally, incorporating domain adaptation into the federated learning framework further boosted performance and generalization across different populations and imaging conditions. This approach supports collaborative model development without data sharing, providing a scalable, privacy-preserving solution for effective retinal disease screening and diagnosis in diverse clinical settings.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests. Ethics: UIC dataset (DS7) was approved by the institutional review board of the University of Illinois at Chicago and complied with the ethical standards stated in the Declaration of Helsinki.

Figures

Fig. 1
Fig. 1. Overview of the four phases of our framework.
a Local learning phase in which a baseline model is trained in a particular dataset and evaluated over test set(s). b Centralized learning approach, comprising pre-training, fine-tuning, and evaluation. c Federated learning (FDL) approach where the pre-training phase is conducted via FDL. d domain adaptation (DAD)-FDL pipeline, where the DAD configuration is distributed before pre-training.
Fig. 2
Fig. 2. DL pipelines of the local learning and pre-training.
a Local learning pipeline, in which a pair of images and their label are input to the model after undergoing four transformations: rotation, color jittering, Gaussian blur, and Sobel filter. b Pre-training phase, during which the input image is masked and given to the reconstruction network to train the encoder over time.
Fig. 3
Fig. 3. Reconstructed images from the MIM network, where 50% of the images are masked.
a1f1 Columns of original image samples from DS1 to DS6, respectively. a2f1 Column of masked images. a3f3 Columns of reconstructed images.
Fig. 4
Fig. 4. Main stages of pre-training and fine-tuning via federated learning (FDL) at the University of North Carolina at Charlotte (UNCC) and the University of Illinois Chicago (UIC).
a Server shares the initial model parameter and the configuration to all nodes. b Each FDL node pre-trains its model and sends the model’s weights to the server. c Finally, the server aggregates the weights and reciprocates them to each client, and they start the fine-tuning step.
Fig. 5
Fig. 5. Samples from DS1 to DS7 and their unsupervised noise-transformed version, where all the transformed images have similar pixel value intensity, are shown in rows d to i in alphabetical order.
a1g1 Columns of original images from DS1 to DS7, respectively. a2g2 Columns of transformed images. h Target image used to transform other images based on its pixel value intensity.
Fig. 6
Fig. 6. Macro AUC-ROC plot of four models over DS7.
Local, centralized, DAD-FDL-1, and DAD-FDL-5, with DAD-FDL-5 outperforming other methods on DS7.
Fig. 7
Fig. 7. Eigen Grad-CAM inference from the norm layer of the final Swin Transformer BlockV2.
Models’ inference samples (choroidal neovasculature (CNV), diabetic macular edema (DME), diabetic retinopathy (DR), Drusen, normal and age-related macular degeneration (AMD)) from DS1 to DS7 datasets, respectively. Checkmarks and crosses indicate correct and incorrect predictions.

Similar articles

Cited by

References

    1. Akpek, E. K. & Smith, R. A. Overview of age-related ocular conditions. Am. J. Manag Care19, S67–75 (2013). - PubMed
    1. Bressler, N. M. Age-related macular degeneration is the leading cause of blindness. JAMA291, 1900–1901 (2004). - PubMed
    1. Steinmetz, J. D. et al. Causes of blindness and vision impairment in 2020 and trends over 30 years, and prevalence of avoidable blindness in relation to vision 2020: the right to sight: an analysis for the global burden of disease study. Lancet Glob. Health9, e144–e160 (2021). - PMC - PubMed
    1. Wang, Y. et al. Global incidence, progression, and risk factors of age-related macular degeneration and projection of disease statistics in 30 years: a modeling study. Gerontology68, 721–735 (2022). - PubMed
    1. Doroudian, S. Collaboration in immersive environments: challenges and solutions. Preprint at https://arxiv.org/abs/2311.00689 (2023).

LinkOut - more resources