A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data
- PMID: 29673314
- PMCID: PMC5909259
- DOI: 10.1186/s12859-018-2141-2
A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data
Abstract
Background: Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration.
Results: To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures.
Conclusion: We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.
Keywords: Multinomial logistic regression; Site-specific model; Somatic cancer mutations.
Conflict of interest statement
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figures





Similar articles
-
Functional and genetic determinants of mutation rate variability in regulatory elements of cancer genomes.Genome Biol. 2021 May 3;22(1):133. doi: 10.1186/s13059-021-02318-x. Genome Biol. 2021. PMID: 33941236 Free PMC article.
-
Identification of coding and non-coding mutational hotspots in cancer genomes.BMC Genomics. 2017 Jan 5;18(1):17. doi: 10.1186/s12864-016-3420-9. BMC Genomics. 2017. PMID: 28056774 Free PMC article.
-
Cell-of-origin chromatin organization shapes the mutational landscape of cancer.Nature. 2015 Feb 19;518(7539):360-364. doi: 10.1038/nature14221. Nature. 2015. PMID: 25693567 Free PMC article.
-
Whole genome sequencing analysis for cancer genomics and precision medicine.Cancer Sci. 2018 Mar;109(3):513-522. doi: 10.1111/cas.13505. Epub 2018 Feb 26. Cancer Sci. 2018. PMID: 29345757 Free PMC article. Review.
-
Somatic structural variation and cancer.Brief Funct Genomics. 2015 Sep;14(5):339-51. doi: 10.1093/bfgp/elv016. Epub 2015 Apr 21. Brief Funct Genomics. 2015. PMID: 25903743 Review.
Cited by
-
A narrative review of prognosis prediction models for non-small cell lung cancer: what kind of predictors should be selected and how to improve models?Ann Transl Med. 2021 Oct;9(20):1597. doi: 10.21037/atm-21-4733. Ann Transl Med. 2021. PMID: 34790803 Free PMC article. Review.
-
ncdDetect2: improved models of the site-specific mutation rate in cancer and driver detection with robust significance evaluation.Bioinformatics. 2019 Jan 15;35(2):189-199. doi: 10.1093/bioinformatics/bty511. Bioinformatics. 2019. PMID: 29945188 Free PMC article.
-
The landscape and driver potential of site-specific hotspots across cancer genomes.NPJ Genom Med. 2021 May 13;6(1):33. doi: 10.1038/s41525-021-00197-6. NPJ Genom Med. 2021. PMID: 33986299 Free PMC article.
-
Untangling a complex web: Computational analyses of tumor molecular profiles to decode driver mechanisms.J Genet Genomics. 2020 Oct 20;47(10):595-609. doi: 10.1016/j.jgg.2020.11.001. Epub 2020 Nov 28. J Genet Genomics. 2020. PMID: 33423960 Free PMC article. Review.
-
Sequence dependencies and mutation rates of localized mutational processes in cancer.Genome Med. 2023 Aug 17;15(1):63. doi: 10.1186/s13073-023-01217-z. Genome Med. 2023. PMID: 37592287 Free PMC article.
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources