A new disease-specific machine learning approach for the prediction of cancer-causing missense variants
- PMID: 21763417
- PMCID: PMC3371640
- DOI: 10.1016/j.ygeno.2011.06.010
A new disease-specific machine learning approach for the prediction of cancer-causing missense variants
Abstract
High-throughput genotyping and sequencing techniques are rapidly and inexpensively providing large amounts of human genetic variation data. Single Nucleotide Polymorphisms (SNPs) are an important source of human genome variability and have been implicated in several human diseases, including cancer. Amino acid mutations resulting from non-synonymous SNPs in coding regions may generate protein functional changes that affect cell proliferation. In this study, we developed a machine learning approach to predict cancer-causing missense variants. We present a Support Vector Machine (SVM) classifier trained on a set of 3163 cancer-causing variants and an equal number of neutral polymorphisms. The method achieve 93% overall accuracy, a correlation coefficient of 0.86, and area under ROC curve of 0.98. When compared with other previously developed algorithms such as SIFT and CHASM our method results in higher prediction accuracy and correlation coefficient in identifying cancer-causing variants.
Copyright © 2011 Elsevier Inc. All rights reserved.
Figures



Similar articles
-
Collective judgment predicts disease-associated single nucleotide variants.BMC Genomics. 2013;14 Suppl 3(Suppl 3):S2. doi: 10.1186/1471-2164-14-S3-S2. Epub 2013 May 28. BMC Genomics. 2013. PMID: 23819846 Free PMC article.
-
Identifying novel oncogenes: a machine learning approach.Interdiscip Sci. 2013 Dec;5(4):241-6. doi: 10.1007/s12539-013-0151-3. Epub 2014 Jan 10. Interdiscip Sci. 2013. PMID: 24402816
-
Improving the prediction of disease-related variants using protein three-dimensional structure.BMC Bioinformatics. 2011;12 Suppl 4(Suppl 4):S3. doi: 10.1186/1471-2105-12-S4-S3. Epub 2011 Jul 5. BMC Bioinformatics. 2011. PMID: 21992054 Free PMC article.
-
Computational prediction of the effects of non-synonymous single nucleotide polymorphisms in human DNA repair genes.Neuroscience. 2007 Apr 14;145(4):1273-9. doi: 10.1016/j.neuroscience.2006.09.004. Epub 2006 Oct 19. Neuroscience. 2007. PMID: 17055652 Review.
-
Prediction of deleterious nonsynonymous single-nucleotide polymorphism for human diseases.ScientificWorldJournal. 2013;2013:675851. doi: 10.1155/2013/675851. Epub 2013 Jan 30. ScientificWorldJournal. 2013. PMID: 23431257 Free PMC article. Review.
Cited by
-
In-silico screening of cancer associated mutation on PLK1 protein and its structural consequences.J Mol Model. 2013 Dec;19(12):5587-99. doi: 10.1007/s00894-013-2044-0. Epub 2013 Nov 23. J Mol Model. 2013. PMID: 24271645
-
Classification of Paediatric Inflammatory Bowel Disease using Machine Learning.Sci Rep. 2017 May 25;7(1):2427. doi: 10.1038/s41598-017-02606-2. Sci Rep. 2017. PMID: 28546534 Free PMC article.
-
Myotonia Congenita in Australian Merino Sheep with a Missense Variant in CLCN1.Animals (Basel). 2024 Dec 22;14(24):3703. doi: 10.3390/ani14243703. Animals (Basel). 2024. PMID: 39765607 Free PMC article.
-
Rapid discrimination between deleterious and benign missense mutations in the CAGI 6 experiment.Hum Genomics. 2024 Aug 27;18(1):89. doi: 10.1186/s40246-024-00655-z. Hum Genomics. 2024. PMID: 39192324 Free PMC article.
-
Oncodomains: A protein domain-centric framework for analyzing rare variants in tumor samples.PLoS Comput Biol. 2017 Apr 20;13(4):e1005428. doi: 10.1371/journal.pcbi.1005428. eCollection 2017 Apr. PLoS Comput Biol. 2017. PMID: 28426665 Free PMC article.
References
-
- Collins FS, Brooks LD, Chakravarti A. A DNA polymorphism discovery resource for research on human genetic variation. Genome Res. 1998;8:1229–1231. - PubMed
-
- Cargill M, Altshuler D, Ireland J, Sklar P, Ardlie K, Patil N, Shaw N, Lane CR, Lim EP, Kalyanaraman N, Nemesh J, Ziaugra L, Friedland L, Rolfe A, Warrington J, Lipshutz R, Daley GQ, Lander ES. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 1999;22:231–238. - PubMed
-
- Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Sun W, Wang H, Wang Y, Xiong X, Xu L, Waye MM, Tsui SK, Xue H, Wong JT, Galver LM, Fan JB, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier JF, Phillips MS, Roumy S, Sallee C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PK, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A, Tanaka T, Tsunoda T, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. - PMC - PubMed
-
- Cotton RG, Auerbach AD, Axton M, Barash CI, Berkovic SF, Brookes AJ, Burn J, Cutting G, den Dunnen JT, Flicek P, Freimer N, Greenblatt MS, Howard HJ, Katz M, Macrae FA, Maglott D, Moslein G, Povey S, Ramesar RS, Richards CS, Capriotti DE, Altman / Seminara RB, Smith TD, Sobrido MJ, Solbakk JH, Tanzi RE, Tavtigian SV, Taylor GR, Utsunomiya J, Watson M. GENETICS. The Human Variome Project. Science. 2008;322:861–862. - PMC - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases