. 2020 Feb 12;9(2):7.

doi: 10.1167/tvst.9.2.7.

A Clinician's Guide to Artificial Intelligence: How to Critically Appraise Machine Learning Studies

Livia Faes^{1

2}, Xiaoxuan Liu^{1

3

4

5}, Siegfried K Wagner⁶, Dun Jack Fu¹, Konstantinos Balaskas^{1

6}, Dawn A Sim^{1

6}, Lucas M Bachmann⁷, Pearse A Keane⁶, Alastair K Denniston^{3

4

5

6

8}

Affiliations

¹ Medical Retina Department, Moorfields Eye Hospital NHS Foundation Trust, London, UK.
² Eye Clinic, Cantonal Hospital of Lucerne, Lucerne, Switzerland.
³ Department of Ophthalmology, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK.
⁴ Academic Unit of Ophthalmology, Institute of Inflammation & Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK.
⁵ Health Data Research UK, London, UK.
⁶ NIHR Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology, London, UK.
⁷ Medignition Inc, Research Consultants, Zurich, Switzerland.
⁸ Centre for Patient Reported Outcome Research, Institute of Applied Health Research, University of Birmingham, Birmingham, UK.

PMID: 32704413
PMCID: PMC7346877
DOI: 10.1167/tvst.9.2.7

A Clinician's Guide to Artificial Intelligence: How to Critically Appraise Machine Learning Studies

Livia Faes et al. Transl Vis Sci Technol. 2020.

. 2020 Feb 12;9(2):7.

doi: 10.1167/tvst.9.2.7.

Authors

Affiliations

¹ Medical Retina Department, Moorfields Eye Hospital NHS Foundation Trust, London, UK.
² Eye Clinic, Cantonal Hospital of Lucerne, Lucerne, Switzerland.
³ Department of Ophthalmology, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK.
⁴ Academic Unit of Ophthalmology, Institute of Inflammation & Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK.
⁵ Health Data Research UK, London, UK.
⁶ NIHR Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology, London, UK.
⁷ Medignition Inc, Research Consultants, Zurich, Switzerland.
⁸ Centre for Patient Reported Outcome Research, Institute of Applied Health Research, University of Birmingham, Birmingham, UK.

PMID: 32704413
PMCID: PMC7346877
DOI: 10.1167/tvst.9.2.7

Erratum in

Erratum.
[No authors listed] [No authors listed] Transl Vis Sci Technol. 2020 Aug 21;9(9):33. doi: 10.1167/tvst.9.9.33. eCollection 2020 Aug. Transl Vis Sci Technol. 2020. PMID: 32908798 Free PMC article.

Abstract

In recent years, there has been considerable interest in the prospect of machine learning models demonstrating expert-level diagnosis in multiple disease contexts. However, there is concern that the excitement around this field may be associated with inadequate scrutiny of methodology and insufficient adoption of scientific good practice in the studies involving artificial intelligence in health care. This article aims to empower clinicians and researchers to critically appraise studies of clinical applications of machine learning, through: (1) introducing basic machine learning concepts and nomenclature; (2) outlining key applicable principles of evidence-based medicine; and (3) highlighting some of the potential pitfalls in the design and reporting of these studies.

Keywords: artificial intelligence; critical appraisal; machine learning.

PubMed Disclaimer

Conflict of interest statement

Disclosure: L. Faes, Alllergan (F), Bayer (F), Novartis (F); X. Liu, None; S.K. Wagner, None; D.J. Fu, None; K. Balaskas, Alimera (F), Allergan (F), Bayer (F), Heidelberg Engineering (F), Novartis (F), TopCon (F); D.A. Sim, Haag-Streit (F), Allergan (F), Novartis (F), and Bayer (F). Allergan (S), Bayer (S), Big Picture Eye Health (C); L.M. Bachmann, Oculocare (E); P.A. Keane, Heidelberg Engineering (F), Topcon (F), Carl Zeiss Meditec (F), Haag-Streit (F), Allergan (F), Novartis (F, S), Bayer (F, S), DeepMind (C), Optos (C); A.K. Denniston, None

Figures

**Figure 1.**
Overview of datasets involved in a machine learning diagnostic algorithm: model development and evaluation.

**Figure 2.**
Overview of confusion matrix/contingency table. Differences in nomenclature for machine learning (boldface type) and classical statistics (italic type) and where overlapping (boldface and italic) are highlighted.

See this image and copyright information in PMC

References

1. De Fauw J, Ledsam JR, Romera-Paredes B et al. .. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med. 2018; 24: 1342–1350. - PubMed
1. Esteva A, Kuprel B, Novoa RA et al. .. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017; 542: 115–118. - PMC - PubMed
1. Becker AS, Mueller M, Stoffel E, Marcon M, Ghafoor S, Boss A. Classification of breast cancer in ultrasound imaging using a generic deep learning analysis software: a pilot study. Br J Radiol. 2018; 91: 20170576. - PMC - PubMed
1. Bien N, Rajpurkar P, Ball RL et al. .. Deep-learning-assisted diagnosis for knee magnetic resonance imaging: development and retrospective validation of MRNet. PLoS Med. 2018; 15: e1002699. - PMC - PubMed
1. Burlina PM, Joshi N, Pekala M, Pacheco KD, Freund DE, Bressler NM. Automated grading of age-related macular degeneration from color fundus images using deep convolutional neural networks. JAMA Ophthalmol. 2017; 135: 1170–1176. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Clinician's Guide to Artificial Intelligence: How to Critically Appraise Machine Learning Studies

Affiliations

A Clinician's Guide to Artificial Intelligence: How to Critically Appraise Machine Learning Studies

Authors

Affiliations

Erratum in

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources