. 2025 Jun 9;5(6):100847.

doi: 10.1016/j.xops.2025.100847. eCollection 2025 Nov-Dec.

A Practical Guide to Evaluating Artificial Intelligence Imaging Models in Scientific Literature

Angela McCarthy¹, Ives Valenzuela¹, Royce W S Chen¹, Lora R Dagi Glass¹, Kaveri Thakoor^{1

2

3

4}

Affiliations

¹ Department of Ophthalmology, Columbia University Irving Medical Center, New York, New York.
² Department of Biomedical Engineering, Columbia University, New York, New York.
³ Department of Computer Science, Columbia University, New York, New York.
⁴ Data Science Institute, Columbia University, New York, New York.

PMID: 40778360
PMCID: PMC12329112
DOI: 10.1016/j.xops.2025.100847

A Practical Guide to Evaluating Artificial Intelligence Imaging Models in Scientific Literature

Angela McCarthy et al. Ophthalmol Sci. 2025.

. 2025 Jun 9;5(6):100847.

doi: 10.1016/j.xops.2025.100847. eCollection 2025 Nov-Dec.

Authors

Angela McCarthy¹, Ives Valenzuela¹, Royce W S Chen¹, Lora R Dagi Glass¹, Kaveri Thakoor^{1

2

3

4}

Affiliations

¹ Department of Ophthalmology, Columbia University Irving Medical Center, New York, New York.
² Department of Biomedical Engineering, Columbia University, New York, New York.
³ Department of Computer Science, Columbia University, New York, New York.
⁴ Data Science Institute, Columbia University, New York, New York.

PMID: 40778360
PMCID: PMC12329112
DOI: 10.1016/j.xops.2025.100847

Abstract

Objective: Recent advances in artificial intelligence (AI) are revolutionizing ophthalmology by enhancing diagnostic accuracy, treatment planning, and patient management. However, a significant gap remains in practical guidance for ophthalmologists who lack AI expertise to effectively analyze these technologies and assess their readiness for integration into clinical practice. This paper aims to bridge this gap by demystifying AI model design and providing practical recommendations for evaluating AI imaging models in research publications.

Design: Educational review: synthesizing key considerations for evaluating AI papers in ophthalmology.

Participants: This paper draws on insights from an interdisciplinary team of ophthalmologists and AI experts with experience in developing and evaluating AI models for clinical applications.

Methods: A structured framework was developed based on expert discussions and a review of key methodological considerations in AI research.

Main outcome measures: A stepwise approach to evaluating AI models in ophthalmology, providing clinicians with practical strategies for assessing AI research.

Results: This guide offers broad recommendations applicable across ophthalmology and medicine.

Conclusions: As the landscape of health care continues to evolve, proactive engagement with AI will empower clinicians to lead the way in innovation while concurrently prioritizing patient safety and quality of care.

Financial disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Keywords: Artificial intelligence; Glaucoma detection; Machine learning; Ophthalmology.

PubMed Disclaimer

Figures

**Figure 1**
Three variations of the same OCT image using data augmentation techniques. The original image (left) is modified to create new computer-generated versions by adding Gaussian noise (center) and applying a horizontal flip (right). These transformations simulate variability in the dataset, enhancing the AI model's ability to generalize. AI = artificial intelligence.

**Figure 2**
Grad-CAM highlights regions most important in the AI model's decision-making process, enhancing interpretability. In this case, the Grad-CAM (right) illustrates that model relies on the RNFL probability map's arcuate region to assess glaucoma, as indicated by the yellow highlights. This provides a visual explanation of the AI's focus during analysis. The color scale (yellow to blue) indicates the relevance of different regions, with yellow showing the highest relevance. AI = artificial intelligence; Grad-CAM = Gradient-Weighted Class Activation Maps; GCL = Ganglion Cell Layer; RNFL = retinal nerve fiber layer; VF = visual fields.

**Figure 3**
Precision-recall curve. AUC = area under the curve; PR = precision-recall.

**Figure 4**
Receiver operating characteristic curve. ROC = receiver operating characteristic.

See this image and copyright information in PMC

References

1. Li Z., Wang L., Wu X., et al. Artificial intelligence in ophthalmology: the path to the real-world clinic. Cell Rep Med. 2023;4 - PMC - PubMed
1. Ting D.S.W., Pasquale L.R., Peng L., et al. Artificial intelligence and deep learning in ophthalmology. Br J Ophthalmol. 2019;103:167–175. - PMC - PubMed
1. Radgoudarzi N., Hallaj S., Boland M.V., et al. Barriers to extracting and harmonizing glaucoma testing data: gaps, shortcomings, and the pursuit of FAIRness. Ophthalmol Sci. 2024;4 - PMC - PubMed
1. Lee A.Y., Campbell J.P., Hwang T.S., et al. Recommendations for standardization of images in ophthalmology. Ophthalmology. 2021;128:969–970. - PMC - PubMed
1. Ting D.S.W., Lee A.Y., Wong T.Y. An ophthalmologist's guide to deciphering studies in artificial intelligence. Ophthalmology. 2019;126:1475–1479. - PMC - PubMed

LinkOut - more resources

Full Text Sources
- Elsevier Science
- PubMed Central
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Practical Guide to Evaluating Artificial Intelligence Imaging Models in Scientific Literature

Affiliations

A Practical Guide to Evaluating Artificial Intelligence Imaging Models in Scientific Literature

Authors

Affiliations

Abstract

Figures

Similar articles

References

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous

Abstract

Figures

Similar articles

References

Related information

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous