Use of estimated evolutionary strength at the codon level improves the prediction of disease-related protein mutations in humans
- PMID: 17935148
- DOI: 10.1002/humu.20628
Use of estimated evolutionary strength at the codon level improves the prediction of disease-related protein mutations in humans
Abstract
Predicting the functional impact of protein variation is one of the most challenging problems in bioinformatics. A rapidly growing number of genome-scale studies provide large amounts of experimental data, allowing the application of rigorous statistical approaches for predicting whether a given single point mutation has an impact on human health. Up until now, existing methods have limited their source data to either protein or gene information. Novel in this work, we take advantage of both and focus on protein evolutionary information by using estimated selective pressures at the codon level. Here we introduce a new method (SeqProfCod) to predict the likelihood that a given protein variant is associated with human disease or not. Our method relies on a support vector machine (SVM) classifier trained using three sources of information: protein sequence, multiple protein sequence alignments, and the estimation of selective pressure at the codon level. SeqProfCod has been benchmarked with a large dataset of 8,987 single point mutations from 1,434 human proteins from SWISS-PROT. It achieves 82% overall accuracy and a correlation coefficient of 0.59, indicating that the estimation of the selective pressure helps in predicting the functional impact of single-point mutations. Moreover, this study demonstrates the synergic effect of combining two sources of information for predicting the functional effects of protein variants: protein sequence/profile-based information and the evolutionary estimation of the selective pressures at the codon level. The results of large-scale application of SeqProfCod over all annotated point mutations in SWISS-PROT (available for download at http://sgu.bioinfo.cipf.es/services/Omidios/; last accessed: 24 August 2007), could be used to support clinical studies.
(c) 2007 Wiley-Liss, Inc.
Similar articles
-
Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information.Bioinformatics. 2006 Nov 15;22(22):2729-34. doi: 10.1093/bioinformatics/btl423. Epub 2006 Aug 7. Bioinformatics. 2006. PMID: 16895930
-
Selective pressures at a codon-level predict deleterious mutations in human disease genes.J Mol Biol. 2006 May 19;358(5):1390-404. doi: 10.1016/j.jmb.2006.02.067. Epub 2006 Mar 15. J Mol Biol. 2006. PMID: 16584746
-
Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information.Bioinformatics. 2005 May 15;21(10):2185-90. doi: 10.1093/bioinformatics/bti365. Epub 2005 Mar 3. Bioinformatics. 2005. PMID: 15746281
-
Safety and nutritional assessment of GM plants and derived food and feed: the role of animal feeding trials.Food Chem Toxicol. 2008 Mar;46 Suppl 1:S2-70. doi: 10.1016/j.fct.2008.02.008. Epub 2008 Feb 13. Food Chem Toxicol. 2008. PMID: 18328408 Review.
-
[Computational analysis of human genome polymorphism].Mol Biol (Mosk). 2009 Mar-Apr;43(2):286-94. Mol Biol (Mosk). 2009. PMID: 19425497 Review. Russian.
Cited by
-
Improving the prediction of disease-related variants using protein three-dimensional structure.BMC Bioinformatics. 2011;12 Suppl 4(Suppl 4):S3. doi: 10.1186/1471-2105-12-S4-S3. Epub 2011 Jul 5. BMC Bioinformatics. 2011. PMID: 21992054 Free PMC article.
-
Computational methods and resources for the interpretation of genomic variants in cancer.BMC Genomics. 2015;16 Suppl 8(Suppl 8):S7. doi: 10.1186/1471-2164-16-S8-S7. Epub 2015 Jun 18. BMC Genomics. 2015. PMID: 26111056 Free PMC article. Review.
-
Bioinformatics and variability in drug response: a protein structural perspective.J R Soc Interface. 2012 Jul 7;9(72):1409-37. doi: 10.1098/rsif.2011.0843. Epub 2012 May 2. J R Soc Interface. 2012. PMID: 22552919 Free PMC article. Review.
-
VARIANT: Command Line, Web service and Web interface for fast and accurate functional characterization of variants found by Next-Generation Sequencing.Nucleic Acids Res. 2012 Jul;40(Web Server issue):W54-8. doi: 10.1093/nar/gks572. Epub 2012 Jun 11. Nucleic Acids Res. 2012. PMID: 22693211 Free PMC article.
-
A new disease-specific machine learning approach for the prediction of cancer-causing missense variants.Genomics. 2011 Oct;98(4):310-7. doi: 10.1016/j.ygeno.2011.06.010. Epub 2011 Jul 7. Genomics. 2011. PMID: 21763417 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources