Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Jan;3(1):e51-e66.
doi: 10.1016/S2589-7500(20)30240-5. Epub 2020 Oct 1.

A global review of publicly available datasets for ophthalmological imaging: barriers to access, usability, and generalisability

Affiliations
Review

A global review of publicly available datasets for ophthalmological imaging: barriers to access, usability, and generalisability

Saad M Khan et al. Lancet Digit Health. 2021 Jan.

Erratum in

Abstract

Health data that are publicly available are valuable resources for digital health research. Several public datasets containing ophthalmological imaging have been frequently used in machine learning research; however, the total number of datasets containing ophthalmological health information and their respective content is unclear. This Review aimed to identify all publicly available ophthalmological imaging datasets, detail their accessibility, describe which diseases and populations are represented, and report on the completeness of the associated metadata. With the use of MEDLINE, Google's search engine, and Google Dataset Search, we identified 94 open access datasets containing 507 724 images and 125 videos from 122 364 patients. Most datasets originated from Asia, North America, and Europe. Disease populations were unevenly represented, with glaucoma, diabetic retinopathy, and age-related macular degeneration disproportionately overrepresented in comparison with other eye diseases. The reporting of basic demographic characteristics such as age, sex, and ethnicity was poor, even at the aggregate level. This Review provides greater visibility for ophthalmological datasets that are publicly available as powerful resources for research. Our paper also exposes an increasing divide in the representation of different population and disease groups in health data repositories. The improved reporting of metadata would enable researchers to access the most appropriate datasets for their needs and maximise the potential of such resources.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests

XL received a proportion of her funding from the Wellcome Trust, through a Health Improvement Challenge grant (200141/Z/15/Z). LF reports an award from Bayer; personal fees from Allergan; and non-financial support from Allergan, outside the submitted work. PAK reports personal fees from DeepMind, Roche, Novartis, Apellis, Heidelberg Engineering, Topcon, Allergan, Bayer, and Big Picture Medical, outside the submitted work; and is supported by the Moorfields Eye Charity Career Development Award (R190028A) and a UK Research & Innovation Future Leaders Fellowship (MR/T019050/1). MJB is supported by the Wellcome Trust (207472/Z/17/Z). AKD received a proportion of his funding from the Department of Health’s National Institute for Health Research Biomedical Research Centre for Ophthalmology at Moorfields Eye Hospital, University College London Institute of Ophthalmology, Health Data Research UK (London, UK), and the Wellcome Trust, through a Health Improvement Challenge grant (200141/Z/15/Z). All other authors declare no competing interests.

Figures

Figure 1
Figure 1. Dataset identification through MEDLINE articles, Google’s Datasets Search, and the Google search engine; and dataset selection and accessibility
Figure 2
Figure 2. Information associated with the publication date (A), geographical distribution (B), represented diseases (C), and image types (D) of the study datasets
AO-SLO=adaptive optics-scanning laser ophthalmoscopy. OCT=optical coherence tomography. OCT-A=optical coherence tomography-angiography. SLO=scanning laser ophthalmoscopy. *Only diseases represented in ≥5 datasets have been included. Where datasets included multiple diseases, they are counted multiple times.
Figure 3
Figure 3. Percentage completion of reporting of metadata items across all 94 datasets
*Reporting at the aggregate level was accepted.

References

    1. Parikh RB, Gdowski A, Patt DA, Hertler A, Mermel C, Bekelman JE. Using big data and predictive analytics to determine patient risk in oncology. Am Soc Clin Oncol Educ Book. 2019;39:e53–58. - PubMed
    1. Wong ZSY, Zhou J, Zhang Q. Artificial intelligence for infectious disease big data analytics. Infect Dis Health. 2019;24:44–48. - PubMed
    1. Kim H-E, Kim HH, Han B-K, et al. Changes in cancer detection and false-positive recall in mammography using artificial intelligence: a retrospective, multireader study. Lancet Digit Health. 2020;2:e138–48. - PubMed
    1. Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115–18. doi: 10.1038/nature21056. - DOI - PMC - PubMed
    1. Kermany DS, Goldbaum M, Cai W, et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell. 2018;172:1122–1131.:e9. - PubMed

Publication types