Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 6;2(9):pgad290.
doi: 10.1093/pnasnexus/pgad290. eCollection 2023 Sep.

Artificial intelligence, explainability, and the scientific method: A proof-of-concept study on novel retinal biomarker discovery

Affiliations

Artificial intelligence, explainability, and the scientific method: A proof-of-concept study on novel retinal biomarker discovery

Parsa Delavari et al. PNAS Nexus. .

Abstract

We present a structured approach to combine explainability of artificial intelligence (AI) with the scientific method for scientific discovery. We demonstrate the utility of this approach in a proof-of-concept study where we uncover biomarkers from a convolutional neural network (CNN) model trained to classify patient sex in retinal images. This is a trait that is not currently recognized by diagnosticians in retinal images, yet, one successfully classified by CNNs. Our methodology consists of four phases: In Phase 1, CNN development, we train a visual geometry group (VGG) model to recognize patient sex in retinal images. In Phase 2, Inspiration, we review visualizations obtained from post hoc interpretability tools to make observations, and articulate exploratory hypotheses. Here, we listed 14 hypotheses retinal sex differences. In Phase 3, Exploration, we test all exploratory hypotheses on an independent dataset. Out of 14 exploratory hypotheses, nine revealed significant differences. In Phase 4, Verification, we re-tested the nine flagged hypotheses on a new dataset. Five were verified, revealing (i) significantly greater length, (ii) more nodes, and (iii) more branches of retinal vasculature, (iv) greater retinal area covered by the vessels in the superior temporal quadrant, and (v) darker peripapillary region in male eyes. Finally, we trained a group of ophthalmologists (N=26) to recognize the novel retinal features for sex classification. While their pretraining performance was not different from chance level or the performance of a nonexpert group (N=31), after training, their performance increased significantly (p<0.001, d=2.63). These findings showcase the potential for retinal biomarker discovery through CNN applications, with the added utility of empowering medical practitioners with new diagnostic capabilities to enhance their clinical toolkit.

Keywords: artificial intelligence; convolutional neural networks; medical image perception; retinal biomarkers; retinal fundus images.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Overview of our methodology.
Fig. 2.
Fig. 2.
A sample fundus image A) along with its binary optic disc mask B), fovea C), and vessel mask D). An illustration of the vessel graph extracted based on binary vessel mask is depicted in panel E). Lines and dots represent the edges and nodes of the obtained vessel graph, respectively.
Fig. 3.
Fig. 3.
Saliency map results of sample fundus images from two male and two female patients. In each panel, the original fundus image, the Guided Grad-CAM output (3-channel image), and its color-coded amplitude (single-channel image) are shown from left to right.
Fig. 4.
Fig. 4.
Feature visualization results for sample fundus images. The top two rows of the middle column show original male images and the bottom two rows show original female images. The left and right columns represent feature visualizations for male and female classes, respectively.
Fig. 5.
Fig. 5.
Accuracy in the 2-AFC sex-recognition task is shown for the pretraining and post-training blocks for the expert ophthalmologist group A) and the nonexpert group B).
Fig. 6.
Fig. 6.
Accuracy in the training block is shown as a function of NOMT performance for the expert ophthalmologist group A) and the nonexpert group B).

Similar articles

Cited by

References

    1. Baraniuk R, Donoho D, Gavish M. 2020. The science of deep learning. Proc Natl Acad Sci USA. 117(48):30029–30032. - PMC - PubMed
    1. Elul Y, Rosenberg AA, Schuster A, Bronstein AM, Yaniv Y. 2021. Meeting the unmet needs of clinicians from AI systems showcased for cardiology with deep-learning-based ECG analysis. Proc Natl Acad Sci USA. 118(24):e2020620118. - PMC - PubMed
    1. Esteva A, et al. 2017. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 542(7639):115–118. - PMC - PubMed
    1. Shen D, Wu G, Suk H-I. 2017. Deep learning in medical image analysis. Annu Rev Biomed Eng. 19:221–248. - PMC - PubMed
    1. Suzuki K. 2017. Overview of deep learning in medical imaging. Radiol Phys Technol. 10(3):257–273. - PubMed