. 2018 Oct;24(10):1559-1567.

doi: 10.1038/s41591-018-0177-5. Epub 2018 Sep 17.

Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning

Nicolas Coudray^#^{1

2}, Paolo Santiago Ocampo^#³, Theodore Sakellaropoulos⁴, Navneet Narula³, Matija Snuderl³, David Fenyö^{5

6}, Andre L Moreira^{3

7}, Narges Razavian⁸, Aristotelis Tsirigos^{9

10}

Affiliations

¹ Applied Bioinformatics Laboratories, New York University School of Medicine, New York, NY, USA.
² Skirball Institute, Department of Cell Biology, New York University School of Medicine, New York, NY, USA.
³ Department of Pathology, New York University School of Medicine, New York, NY, USA.
⁴ School of Mechanical Engineering, National Technical University of Athens, Zografou, Greece.
⁵ Institute for Systems Genetics, New York University School of Medicine, New York, NY, USA.
⁶ Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, NY, USA.
⁷ Center for Biospecimen Research and Development, New York University, New York, NY, USA.
⁸ Department of Population Health and the Center for Healthcare Innovation and Delivery Science, New York University School of Medicine, New York, NY, USA. narges.razavian@nyumc.org.
⁹ Applied Bioinformatics Laboratories, New York University School of Medicine, New York, NY, USA. aristotelis.tsirigos@nyumc.org.
¹⁰ Department of Pathology, New York University School of Medicine, New York, NY, USA. aristotelis.tsirigos@nyumc.org.

^# Contributed equally.

PMID: 30224757
PMCID: PMC9847512
DOI: 10.1038/s41591-018-0177-5

Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning

Nicolas Coudray et al. Nat Med. 2018 Oct.

. 2018 Oct;24(10):1559-1567.

doi: 10.1038/s41591-018-0177-5. Epub 2018 Sep 17.

Authors

Affiliations

¹ Applied Bioinformatics Laboratories, New York University School of Medicine, New York, NY, USA.
² Skirball Institute, Department of Cell Biology, New York University School of Medicine, New York, NY, USA.
³ Department of Pathology, New York University School of Medicine, New York, NY, USA.
⁴ School of Mechanical Engineering, National Technical University of Athens, Zografou, Greece.
⁵ Institute for Systems Genetics, New York University School of Medicine, New York, NY, USA.
⁶ Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, NY, USA.
⁷ Center for Biospecimen Research and Development, New York University, New York, NY, USA.
⁸ Department of Population Health and the Center for Healthcare Innovation and Delivery Science, New York University School of Medicine, New York, NY, USA. narges.razavian@nyumc.org.
⁹ Applied Bioinformatics Laboratories, New York University School of Medicine, New York, NY, USA. aristotelis.tsirigos@nyumc.org.
¹⁰ Department of Pathology, New York University School of Medicine, New York, NY, USA. aristotelis.tsirigos@nyumc.org.

^# Contributed equally.

PMID: 30224757
PMCID: PMC9847512
DOI: 10.1038/s41591-018-0177-5

Abstract

Visual inspection of histopathology slides is one of the main methods used by pathologists to assess the stage, type and subtype of lung tumors. Adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) are the most prevalent subtypes of lung cancer, and their distinction requires visual inspection by an experienced pathologist. In this study, we trained a deep convolutional neural network (inception v3) on whole-slide images obtained from The Cancer Genome Atlas to accurately and automatically classify them into LUAD, LUSC or normal lung tissue. The performance of our method is comparable to that of pathologists, with an average area under the curve (AUC) of 0.97. Our model was validated on independent datasets of frozen tissues, formalin-fixed paraffin-embedded tissues and biopsies. Furthermore, we trained the network to predict the ten most commonly mutated genes in LUAD. We found that six of them-STK11, EGFR, FAT1, SETBP1, KRAS and TP53-can be predicted from pathology images, with AUCs from 0.733 to 0.856 as measured on a held-out population. These findings suggest that deep-learning models can assist pathologists in the detection of cancer subtype or gene mutations. Our approach can be applied to any cancer type, and the code is available at https://github.com/ncoudray/DeepPATH .

PubMed Disclaimer

Conflict of interest statement

Competing Financial Interests Statement

The authors declare no competing interests.

Figures

**Figure 1.. Data and strategy:**
**(a)** Number of whole-slide images per class. **(b)** Strategy: **(b1)** Images of lung cancer tissues were first downloaded from the Genomic Data Common database; **(b2)** slides were then separated into a training (70%), a validation (15%) and a test set (15%); **(b3)** slides were tiled by non-overlapping 512×512 pixels windows, omitting those with over 50% background; **(b4)** the Inception v3 architecture was used and partially or fully re-trained using the training and validation tiles; **(b5)** classifications were performed on tiles from an independent test set and the results were finally aggregated per slide to extract the heatmaps and the AUC statistics. **(c)** Size distribution of the images widths (gray) and heights (black). **(d)** Distribution of the number of tiles per slide.

**Figure 2.. Classification of presence and type of tumor on alternative cohorts:**
Receiver Operating Characteristic (ROC) curves (left) from tests on **(a)** frozen sections (n=98 biologically independent slides), **(b)** formalin-fixed paraffin-embedded (FFPE) sections (n=140 biologically independent slides) and **(c)** biopsies (n=102 biologically independent slides) from NYU Langone Medical Center. On the right of each plot, we show examples of raw images with an overlap in light grey of the mask generated by a pathologist and the corresponding heatmaps obtained with the three-way classifier. Scale bars are 1 mm.

**Figure 3.. Gene mutation prediction from histopathology slides give promising results for at least 6 genes:**
**(a)** Mutation probability distribution for slides where each mutation is present or absent (tile aggregation by averaging output probability). **(b)** ROC curves associated with the top four predictions (a). **(c)** Allele frequency as a function of slides classified by the deep learning network as having a certain gene mutation (P≥0.5), or the wild-type (P<0.5). p-values estimated with two-tailed Mann-Whitney U-test are shown as ns (p>0.05), * (p≤0.05), ** (p≤0.01) or *** (p≤0.001). For a, b and c, n=62 slides from 59 patients. For the two box plots, whiskers represent the minima and maxima. The middle line within the box represents the median.

**Figure 4.. Spatial heterogeneity of predicted mutations.**
**(a)** Probability distribution on LUAD tiles for the 6 predictable mutations with average values in dotted lines (n=327 non-overlapping tiles). The allele frequency is 0.33 for TP53, 0.25 for STK11 and 0 for the 4 other mutations. (b) heatmap of TP53 and **(c)** STK11 when only tiles classified as LUAD are selected, and in **(d)** and **(e)** when all the tiles are considered. Scale bars are 1 mm.

See this image and copyright information in PMC

Comment in

AI to assess images.
Romero D. Romero D. Nat Rev Clin Oncol. 2018 Dec;15(12):724. doi: 10.1038/s41571-018-0107-y. Nat Rev Clin Oncol. 2018. PMID: 30266916 No abstract available.
The promise and challenges of deep learning models for automated histopathologic classification and mutation prediction in lung cancer.
Patil PD, Hobbs B, Pennell NA. Patil PD, et al. J Thorac Dis. 2019 Feb;11(2):369-372. doi: 10.21037/jtd.2018.12.55. J Thorac Dis. 2019. PMID: 30962976 Free PMC article. No abstract available.

References

1. Travis WD et al. International Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society International Multidisciplinary Classification of Lung Adenocarcinoma. Journal of Thoracic Oncology 6, 244–285 (2011). - PMC - PubMed
1. Hanna N et al. Systemic therapy for stage IV non–small-cell lung cancer: American Society of Clinical Oncology clinical practice guideline update. Journal of Clinical Oncology 35, 3484–3515 (2017). - PubMed
1. Chan BA & Hughes BG Targeted therapy for non-small cell lung cancer: current standards and the promise of the future. Translational Lung Cancer Research 4, 36–54 (2015). - PMC - PubMed
1. Parums DV Current status of targeted therapy in non-small cell lung cancer. Drugs Today (Barc). 50, 503–525 (2014). - PubMed
1. Terra SB et al. Molecular characterization of pulmonary sarcomatoid carcinoma: analysis of 33 cases. Modern Pathology 29, 824–831 (2016). - PubMed

Methods-Only References

1. The Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519–525 (2012). - PMC - PubMed
1. The Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014). - PMC - PubMed
1. Hanley JA & McNeil BJ The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36 (1982). - PubMed
1. Pedregosa F et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011).
1. Efron B & Tibshirani RJ An introduction to the bootstrap. Vol. 56 (1994).

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

P30 CA016087/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
Medical
- ClinicalTrials.gov
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning

Affiliations

Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

References

Methods-Only References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Research Materials

Miscellaneous