Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 12;9(2):7.
doi: 10.1167/tvst.9.2.7.

A Clinician's Guide to Artificial Intelligence: How to Critically Appraise Machine Learning Studies

Affiliations

A Clinician's Guide to Artificial Intelligence: How to Critically Appraise Machine Learning Studies

Livia Faes et al. Transl Vis Sci Technol. .

Erratum in

  • Erratum.
    [No authors listed] [No authors listed] Transl Vis Sci Technol. 2020 Aug 21;9(9):33. doi: 10.1167/tvst.9.9.33. eCollection 2020 Aug. Transl Vis Sci Technol. 2020. PMID: 32908798 Free PMC article.

Abstract

In recent years, there has been considerable interest in the prospect of machine learning models demonstrating expert-level diagnosis in multiple disease contexts. However, there is concern that the excitement around this field may be associated with inadequate scrutiny of methodology and insufficient adoption of scientific good practice in the studies involving artificial intelligence in health care. This article aims to empower clinicians and researchers to critically appraise studies of clinical applications of machine learning, through: (1) introducing basic machine learning concepts and nomenclature; (2) outlining key applicable principles of evidence-based medicine; and (3) highlighting some of the potential pitfalls in the design and reporting of these studies.

Keywords: artificial intelligence; critical appraisal; machine learning.

PubMed Disclaimer

Conflict of interest statement

Disclosure: L. Faes, Alllergan (F), Bayer (F), Novartis (F); X. Liu, None; S.K. Wagner, None; D.J. Fu, None; K. Balaskas, Alimera (F), Allergan (F), Bayer (F), Heidelberg Engineering (F), Novartis (F), TopCon (F); D.A. Sim, Haag-Streit (F), Allergan (F), Novartis (F), and Bayer (F). Allergan (S), Bayer (S), Big Picture Eye Health (C); L.M. Bachmann, Oculocare (E); P.A. Keane, Heidelberg Engineering (F), Topcon (F), Carl Zeiss Meditec (F), Haag-Streit (F), Allergan (F), Novartis (F, S), Bayer (F, S), DeepMind (C), Optos (C); A.K. Denniston, None

Figures

Figure 1.
Figure 1.
Overview of datasets involved in a machine learning diagnostic algorithm: model development and evaluation.
Figure 2.
Figure 2.
Overview of confusion matrix/contingency table. Differences in nomenclature for machine learning (boldface type) and classical statistics (italic type) and where overlapping (boldface and italic) are highlighted.

References

    1. De Fauw J, Ledsam JR, Romera-Paredes B et al. .. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med. 2018; 24: 1342–1350. - PubMed
    1. Esteva A, Kuprel B, Novoa RA et al. .. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017; 542: 115–118. - PMC - PubMed
    1. Becker AS, Mueller M, Stoffel E, Marcon M, Ghafoor S, Boss A. Classification of breast cancer in ultrasound imaging using a generic deep learning analysis software: a pilot study. Br J Radiol. 2018; 91: 20170576. - PMC - PubMed
    1. Bien N, Rajpurkar P, Ball RL et al. .. Deep-learning-assisted diagnosis for knee magnetic resonance imaging: development and retrospective validation of MRNet. PLoS Med. 2018; 15: e1002699. - PMC - PubMed
    1. Burlina PM, Joshi N, Pekala M, Pacheco KD, Freund DE, Bressler NM. Automated grading of age-related macular degeneration from color fundus images using deep convolutional neural networks. JAMA Ophthalmol. 2017; 135: 1170–1176. - PMC - PubMed

Publication types