Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr;9(4):539-551.
doi: 10.1038/s41551-025-01365-0. Epub 2025 Mar 5.

A data-efficient strategy for building high-performing medical foundation models

Affiliations

A data-efficient strategy for building high-performing medical foundation models

Yuqi Sun et al. Nat Biomed Eng. 2025 Apr.

Abstract

Foundation models are pretrained on massive datasets. However, collecting medical datasets is expensive and time-consuming, and raises privacy concerns. Here we show that synthetic data generated via conditioning with disease labels can be leveraged for building high-performing medical foundation models. We pretrained a retinal foundation model, first with approximately one million synthetic retinal images with physiological structures and feature distribution consistent with real counterparts, and then with only 16.7% of the 904,170 real-world colour fundus photography images required in a recently reported retinal foundation model (RETFound). The data-efficient model performed as well or better than RETFound across nine public datasets and four diagnostic tasks; and for diabetic-retinopathy grading, it used only 40% of the expert-annotated training data used by RETFound. We also support the generalizability of the data-efficient strategy by building a classifier for the detection of tuberculosis on chest X-ray images. The text-conditioned generation of synthetic data may enhance the performance and generalization of medical foundation models.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

References

    1. Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023). - DOI - PubMed - PMC
    1. Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023). - DOI - PubMed - PMC
    1. Huang, Z. et al. A visual-language foundation model for pathology image analysis using medical Twitter. Nat. Med. 29, 2307–2316 (2023). - DOI - PubMed
    1. Zhang, X. et al. Knowledge-enhanced visual-language pre-training on chest radiology images. Nat. Commun. 14, 4542 (2023). - DOI - PubMed - PMC
    1. Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023). - DOI - PubMed

MeSH terms

LinkOut - more resources