Comparative Study

. 2016 Oct;48(10):1288-94.

doi: 10.1038/ng.3658. Epub 2016 Sep 12.

Unsupervised detection of cancer driver mutations with parsimony-guided learning

Runjun D Kumar^{1

2

3}, S Joshua Swamidass^{2

4}, Ron Bose¹

Affiliations

¹ Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, Missouri, USA.
² Computational and Systems Biology Program, Washington University in St. Louis, St. Louis, Missouri, USA.
³ Medical Scientist Training Program, Washington University School of Medicine, St. Louis, Missouri, USA.
⁴ Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, Missouri, USA.

PMID: 27618449
PMCID: PMC5328615
DOI: 10.1038/ng.3658

Comparative Study

Unsupervised detection of cancer driver mutations with parsimony-guided learning

Runjun D Kumar et al. Nat Genet. 2016 Oct.

. 2016 Oct;48(10):1288-94.

doi: 10.1038/ng.3658. Epub 2016 Sep 12.

Authors

Runjun D Kumar^{1

2

3}, S Joshua Swamidass^{2

4}, Ron Bose¹

Affiliations

¹ Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, Missouri, USA.
² Computational and Systems Biology Program, Washington University in St. Louis, St. Louis, Missouri, USA.
³ Medical Scientist Training Program, Washington University School of Medicine, St. Louis, Missouri, USA.
⁴ Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, Missouri, USA.

PMID: 27618449
PMCID: PMC5328615
DOI: 10.1038/ng.3658

Abstract

Methods are needed to reliably prioritize biologically active driver mutations over inactive passengers in high-throughput sequencing cancer data sets. We present ParsSNP, an unsupervised functional impact predictor that is guided by parsimony. ParsSNP uses an expectation-maximization framework to find mutations that explain tumor incidence broadly, without using predefined training labels that can introduce biases. We compare ParsSNP to five existing tools (CanDrA, CHASM, FATHMM Cancer, TransFIC, and Condel) across five distinct benchmarks. ParsSNP outperformed the existing tools in 24 of 25 comparisons. To investigate the real-world benefit of these improvements, we applied ParsSNP to an independent data set of 30 patients with diffuse-type gastric cancer. ParsSNP identified many known and likely driver mutations that other methods did not detect, including truncation mutations in known tumor suppressors and the recurrent driver substitution RHOA p.Tyr42Cys. In conclusion, ParsSNP uses an innovative, parsimony-based approach to prioritize cancer driver mutations and provides dramatic improvements over existing methods.

PubMed Disclaimer

Conflict of interest statement

Competing Financial Interests

The authors declare no competing financial interests.

Figures

**Figure 1. Overview of ParsSNP and label learning**
A) 1. Label learning begins with a training set of mutations, each belonging to a sample. 2. Descriptors are assigned, and random labels generated (portrayed numbers are illustrative). 3. EM updates labels iteratively such that putative drivers are distributed among samples (E-step) and defined in terms of descriptors (M-step). 4. The final labels and descriptors are used to train a neural network model. 5. The ParsSNP model produces ParsSNP scores when applied to new mutations. B) Distribution of ParsSNP labels after averaging 50 runs (N=566,223). C) Percent contribution of descriptors to ParsSNP scores, using Garson’s algorithm for neural network weights (see text). D) The ParsSNP model was applied to the training and hypermutator pan-cancer sets to produce ParsSNP scores. The fraction of mutations identified as drivers is displayed at various sample mutation burdens and ParsSNP thresholds.

**Figure 2. ParsSNP detects recurrent mutations and mutations in known cancer genes in the pan-cancer test set**
A) ParsSNP scores plotted against mutation recurrence for missense mutations (N=182,483). Points are jittered to aid visualization. B) The association of ParsSNP and the independent tools to log(mutation recurrence) for missense mutations, measured by R-square. C) ParsSNP identifies 9,434 recurrent missense mutations better than the independent tools (all Delong tests p<2.2e-16, AUROCs are depicted). D) The ability of ParsSNP to detect recurrent missense mutations in the test set is assessed on a gene-by-gene basis. Portrayed genes must be members of the CGC, have at least 25 missense mutations, and have at least 10 mutations in each class. Mutation counts (non-recurrent:recurrent) and 95% confidence intervals are included for each gene. E) Out of 173,049 non-recurrent missense mutations, ParsSNP identifies the 3,760 which occur in the CGC significantly better than the independent tools (all Delong tests p<2.2e-16, AUROCs are depicted). F) CGC genes were divided into putative oncogenes and putative tumor suppressor genes (TSG) based on the molecular genetic annotation from the CGC dataset (dominant or recessive, respectively). The distribution of ParsSNP scores in the test set is displayed by mutation and gene type, with the number of genes and mutations in each category displayed. ‘Truncation’ events include frameshift, premature stop, nonstop and splice-site changes. ‘Missense’ mutations include missense substitutions as well as inframe insertions/deletions. ‘Silent’ changes include synonymous nucleotide substitutions as well as non-coding variants.

**Figure 3. ParsSNP identifies experimentally validated mutations in external datasets**
A) ParsSNP separates 1,138 driver mutations from 49,880 common SNPs in the driver-dbSNP dataset slightly better than FATHMM Cancer (Delong test p=0.205) and significantly better than the other independent tools (all Delong tests p<1e-4, AUROCs are depicted). B) Plot of ParsSNP scores and P53 transactivation activity change for 2,314 mutations in the IARC dataset. C) The association of ParsSNP and the independent tools with log(P53 activity change) is displayed as measured by R-square values. D) ParsSNP identifies 475 disruptive P53 mutations (mutation P53 activity < 25% of wild type) among 2,314 mutations with similar performance to CHASM (Delong test p=0.39) and Condel (Delong test p=0.59), while CanDrA, FATHMM Cancer and TransFIC perform worse (all Delong tests p<0.05, AUROCs are depicted). E) ParsSNP separates 45 experimentally defined functional variants from 26 experimentally defined neutral variants better than existing methods, with Delong test p<0.05 for all except CHASM (Delong test p=0.085).

**Figure 4. Comparison of candidate driver mutations in an independent dataset reveals known and likely drivers which are only identified by ParsSNP**
ParsSNP, CanDrA and CHASM were applied to the Kakiuchi *et al* dataset, which consists of 2,988 protein-coding somatic mutations from 30 diffuse-type gastric carcinoma patients. For each tool, the top 30 predicted drivers (equivalent to 1% of the dataset) were extracted. The overlap between the candidate driver lists from each tool is diagramed (top left), and the candidate drivers themselves are listed according to the tools they were identified by.

See this image and copyright information in PMC

References

1. Forbes SA, et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Research. 2011;39:D945–D950. - PMC - PubMed
1. Vogelstein B, et al. Cancer Genome Landscapes. Science. 2013;339:1546–1558. - PMC - PubMed
1. Carter H, Douville C, Stenson PD, Cooper DN, Karchin R. Identifying Mendelian disease genes with the variant effect scoring tool. BMC Genomics. 2013;14(Suppl 3):S3. - PMC - PubMed
1. Kircher M, et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nature genetics. 2014;46:310–315. - PMC - PubMed
1. Adzhubei IA, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7:248–9. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Unsupervised detection of cancer driver mutations with parsimony-guided learning

Affiliations

Unsupervised detection of cancer driver mutations with parsimony-guided learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources