Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul;129(7):e69-e76.
doi: 10.1016/j.ophtha.2022.02.008. Epub 2022 Feb 12.

Artificial Intelligence for Retinopathy of Prematurity: Validation of a Vascular Severity Scale against International Expert Diagnosis

Collaborators, Affiliations

Artificial Intelligence for Retinopathy of Prematurity: Validation of a Vascular Severity Scale against International Expert Diagnosis

J Peter Campbell et al. Ophthalmology. 2022 Jul.

Abstract

Purpose: To validate a vascular severity score as an appropriate output for artificial intelligence (AI) Software as a Medical Device (SaMD) for retinopathy of prematurity (ROP) through comparison with ordinal disease severity labels for stage and plus disease assigned by the International Classification of Retinopathy of Prematurity, Third Edition (ICROP3), committee.

Design: Validation study of an AI-based ROP vascular severity score.

Participants: A total of 34 ROP experts from the ICROP3 committee.

Methods: Two separate datasets of 30 fundus photographs each for stage (0-5) and plus disease (plus, preplus, neither) were labeled by members of the ICROP3 committee using an open-source platform. Averaging these results produced a continuous label for plus (1-9) and stage (1-3) for each image. Experts were also asked to compare each image to each other in terms of relative severity for plus disease. Each image was also labeled with a vascular severity score from the Imaging and Informatics in ROP deep learning system, which was compared with each grader's diagnostic labels for correlation, as well as the ophthalmoscopic diagnosis of stage.

Main outcome measures: Weighted kappa and Pearson correlation coefficients (CCs) were calculated between each pair of grader classification labels for stage and plus disease. The Elo algorithm was also used to convert pairwise comparisons for each expert into an ordered set of images from least to most severe.

Results: The mean weighted kappa and CC for all interobserver pairs for plus disease image comparison were 0.67 and 0.88, respectively. The vascular severity score was found to be highly correlated with both the average plus disease classification (CC = 0.90, P < 0.001) and the ophthalmoscopic diagnosis of stage (P < 0.001 by analysis of variance) among all experts.

Conclusions: The ROP vascular severity score correlates well with the International Classification of Retinopathy of Prematurity committee member's labels for plus disease and stage, which had significant intergrader variability. Generation of a consensus for a validated scoring system for ROP SaMD can facilitate global innovation and regulatory authorization of these technologies.

Keywords: Artificial intelligence; Deep learning; Disease classification; Interobserver agreement; Retinopathy of prematurity; Severity score.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Example images for determination of plus disease and stage by members of the international classification of retinopathy of prematurity (ICROP) committee.
The two images at the top are from the plus disease database, and the bottom two images are from the stage dataset. The white numbers represent the number of committee experts who assigned each label for plus or stage to each image, respectively. The colored numbers in the upper right of each image represent the averaged expert classification on a scale of 1–3 for stage, and 1–9 for plus. Both plus and stage appear to present on a continuum, which can be measured by comparing expert labels. Note that the lower right quadrant image is one of the standard images of “Stage 1” disease published in the 2005 ICROP revisited paper, which may be an example of temporal diagnostic drift.
Figure 2.
Figure 2.. Spectrum of disease severity for plus and stage in retinopathy of prematurity.
In each case, the middle portion of the figure represents the individual expert labels for each image in the dataset for plus (N=30), and Stage (N=28). Each row represents one image, and the columns in the “Expert” section depict individual expert classifications. Experts were ranked in order of least aggressive diagnosis to most aggressive diagnosis from left to right. Images were ranked from least severe to most severe by average expert classification. Color code represents the underlying class label from green to red in order of increasing severity (no plus, pre-plus, plus or stage 0, 1, 2, 3 or 4). The ordinal column represents the mode classification, reflecting the current ICROP classification schema, and the Average column represents the average disease classification severity, from the individual ICROP experts. Average disease severity better reflects expert diagnosis compared to an ordinal classification system.
Figure 3.
Figure 3.. Classification versus comparison agreement.
A) Interexpert agreement on plus disease label for 34 experts. Inset legend reports weighted kappa color scale for pairwise agreement for each expert relative to each other. Mean weighted kappa for all inter-observer pairs 0.67. B) Interexpert agreement for overall disease rankings for relative disease severity for 34 experts, as measured by correlation coefficient (CC). Mean CC for all inter-observer pairs 0.88. C) Correlation between average disease severity according to ordinal labels of 34 experts versus rank ordered severity using relative rankings (CC 0.96)
Figure 4.
Figure 4.. Relationship between deep learning derived vascular severity score (VSS) and the mode plus classification, average plus classification, and associated ophthalmoscopic diagnosis of stage in plus disease dataset.
A) Box plot of VSS vs mode plus disease classification (P<0.001). B) Scatter plot of VSS vs average disease severity classification (correlation coefficient 0.90). C) Box plot of VSS from plus disease images compared with ophthalmoscopic diagnosis of stage in the same eyes (P<0.001). The VSS corresponds to the current mode classification of plus disease, a continuous spectrum of plus disease as determined by expert classifications, and with the ophthalmoscopic diagnosis of stage in the same eyes (not shown on images).

References

    1. Abramoff MD, Cunningham B, Patel B, et al. Foundational Considerations for Artificial Intelligence Utilizing Ophthalmic Images. Ophthalmology 2021. Available at: https://www.sciencedirect.com/science/article/pii/S0161642021006436. - PMC - PubMed
    1. U.S. Food & Drug Administration (FDA) Digital Health Center of Excellence, Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan. Available at: https://www.fda.gov/media/145022/download.
    1. Collaborative Community on Ophthalmic Imaging. Available at: https://www.cc-oi.org/ [Accessed August 31, 2021].
    1. Blencowe H, Lawn JE, Vazquez T, et al. Preterm-associated visual impairment and estimates of retinopathy of prematurity at regional and global levels for 2010. Pediatr Res 2013;74:35–49. - PMC - PubMed
    1. Blencowe H, Cousens S, Chou D, et al. Born Too Soon: The global epidemiology of 15 million preterm births. Reprod Health 2013;10:S2. - PMC - PubMed

Publication types