A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns

Wei Jiao^#¹, Gurnit Atwal^#^{1

2

3}, Paz Polak^#^{4

5

6

7}, Rosa Karlic⁸, Edwin Cuppen^{9

10}; PCAWG Tumor Subtypes and Clinical Translation Working Group; Alexandra Danyi¹¹, Jeroen de Ridder¹¹, Carla van Herpen¹², Martijn P Lolkema¹³, Neeltje Steeghs¹⁴, Gad Getz^{4

5

6

15}, Quaid D Morris^{3

16}, Lincoln D Stein^{17

18}; PCAWG Consortium

Collaborators, Affiliations

PMID: 32024849
PMCID: PMC7002586
DOI: 10.1038/s41467-019-13825-8

A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns

Wei Jiao et al. Nat Commun. 2020.

. 2020 Feb 5;11(1):728.

doi: 10.1038/s41467-019-13825-8.

PMID: 32024849
PMCID: PMC7002586
DOI: 10.1038/s41467-019-13825-8

Erratum in

Author Correction: A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns.
Jiao W, Atwal G, Polak P, Karlic R, Cuppen E; PCAWG Tumor Subtypes and Clinical Translation Working Group; Danyi A, de Ridder J, van Herpen C, Lolkema MP, Steeghs N, Getz G, Morris QD, Stein LD; PCAWG Consortium. Jiao W, et al. Nat Commun. 2022 Dec 8;13(1):7573. doi: 10.1038/s41467-022-32329-6. Nat Commun. 2022. PMID: 36481665 Free PMC article. No abstract available.

Abstract

In cancer, the primary tumour's organ of origin and histopathology are the strongest determinants of its clinical behaviour, but in 3% of cases a patient presents with a metastatic tumour and no obvious primary. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we train a deep learning classifier to predict cancer type based on patterns of somatic passenger mutations detected in whole genome sequencing (WGS) of 2606 tumours representing 24 common cancer types produced by the PCAWG Consortium. Our classifier achieves an accuracy of 91% on held-out tumor samples and 88% and 83% respectively on independent primary and metastatic samples, roughly double the accuracy of trained pathologists when presented with a metastatic tumour without knowledge of the primary. Surprisingly, adding information on driver mutations reduced accuracy. Our results have clinical applicability, underscore how patterns of somatic passenger mutations encode the state of the cell of origin, and can inform future strategies to detect the source of circulating tumour DNA.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1. Comparison of tumour-type classifiers using single and multiple feature types.**
a Radar plots describing the cross-validation-derived accuracy (F1) score of Random Forest classifiers trained on each of 7 individual feature categories, across six representative tumour types. b Summary of Random Forest classifier accuracy (F1) trained on individual feature categories across all 24 tumour types. c Accuracy of classifiers trained on multiple feature categories. *RF Best Models* corresponds to the cross-validation F1 scores of Random Forest classifiers trained on the three best single-feature categories for all 24 tumour types. *DNN Model* shows the distribution of F1 scores for held-out samples for a multi-class neural network trained using passenger mutation distribution and type. *DNN Model* + *Drivers* shows F1 scores for the neural net when driver genes and pathways are added to the training features. The centre line in the boxplot represents the median of the F1 scores. The lower and upper bounds of the box represent the first and third quartile. The whiskers extend to 1.5 IQR plus the third quartile or minus the first quantile.

**Fig. 2. Heatmap displaying the accuracy of the merged classifier using a held-out portion of the PCAWG data set for evaluation.**
Each row corresponds to the true tumour type; columns correspond to the class predictions emitted by the DNN. Cells are labelled with the percentage of tumours of a particular type that were classified by the DNN as a particular type. The recall and precision of each classifier are shown in the colour bars at the top and left sides of the matrix. All values represent the mean of 10 runs using selected data set partitions. Due to rounding of values, some rows add up to slightly more or less than 100%.

**Fig. 3. Performance of the DNN on held-out PCAWG data.**
a The relationship between training set size and prediction accuracy of the DNN is shown for each tumour type. The blue line represents a regression line fit using LOESS regression, while the grey area represents a 95% confidence interval for the regression function. b Accuracy of the classifier when it is asked to identify the correct tumour type among its top N-ranked predictions. The blue dashed line is the median true-positive rate among all 24 tumour classes. The green and red dashed lines correspond to the true- positive rate for the best- and worst-performing tumour classes.

**Fig. 4. Prediction accuracy for the DNN against two independent validation data sets.**
a Primary tumours. b Metastatic tumours. Each row corresponds to the true tumour type; columns correspond to the class predictions emitted by the DNN. Cells are labelled with the percentage of tumours of a particular type that were classified by the DNN as a particular type. The recall and precision of each classifier are shown in the colour bars at the top and left sides of the matrix. Due to rounding of values, some rows add up to slightly more or less than 100%.

See this image and copyright information in PMC

References

1. Greco FA. Molecular diagnosis of the tissue of origin in cancer of unknown primary site: useful in patient management. Curr. Treat. Options Oncol. 2013;14:634–642. doi: 10.1007/s11864-013-0257-1. - DOI - PubMed
1. Pavlidis N, Khaled H, Gaafar R. A mini review on cancer of unknown primary site: a clinical puzzle for the oncologists. J. Advert. Res. 2015;6:375–382. doi: 10.1016/j.jare.2014.11.007. - DOI - PMC - PubMed
1. D’Cruze, L. The role of immunohistochemistry in the analysis of the spectrum of small round cell tumours at a tertiary care centre. J. Clin. Diagn. Res. 10.7860/jcdr/2013/5127.3132 (2013). - PMC - PubMed
1. Kandoth C, et al. Mutational landscape and significance across 12 major cancer types. Nature. 2013;502:333–339. doi: 10.1038/nature12634. - DOI - PMC - PubMed
1. Lawrence MS, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499:214–218. doi: 10.1038/nature12213. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns

A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns

Erratum in

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical