Supervised multivariate analysis of sequence groups to identify specificity determining residues
- PMID: 17451607
- PMCID: PMC1878507
- DOI: 10.1186/1471-2105-8-135
Supervised multivariate analysis of sequence groups to identify specificity determining residues
Abstract
Background: Proteins that evolve from a common ancestor can change functionality over time, and it is important to be able identify residues that cause this change. In this paper we show how a supervised multivariate statistical method, Between Group Analysis (BGA), can be used to identify these residues from families of proteins with different substrate specifities using multiple sequence alignments.
Results: We demonstrate the usefulness of this method on three different test cases. Two of these test cases, the Lactate/Malate dehydrogenase family and Nucleotidyl Cyclases, consist of two functional groups. The other family, Serine Proteases consists of three groups. BGA was used to analyse and visualise these three families using two different encoding schemes for the amino acids.
Conclusion: This overall combination of methods in this paper is powerful and flexible while being computationally very fast and simple. BGA is especially useful because it can be used to analyse any number of functional classes. In the examples we used in this paper, we have only used 2 or 3 classes for demonstration purposes but any number can be used and visualised.
Figures










Similar articles
-
Optimizing the size of the sequence profiles to increase the accuracy of protein sequence alignments generated by profile-profile algorithms.Bioinformatics. 2008 May 1;24(9):1145-53. doi: 10.1093/bioinformatics/btn097. Epub 2008 Mar 12. Bioinformatics. 2008. PMID: 18337259
-
Protein structure mining using a structural alphabet.Proteins. 2008 May 1;71(2):920-37. doi: 10.1002/prot.21776. Proteins. 2008. PMID: 18004784
-
BiasViz: visualization of amino acid biased regions in protein alignments.Bioinformatics. 2007 Nov 15;23(22):3093-4. doi: 10.1093/bioinformatics/btm489. Epub 2007 Oct 6. Bioinformatics. 2007. PMID: 17921493
-
A simple genetic algorithm for multiple sequence alignment.Genet Mol Res. 2007 Oct 5;6(4):964-82. Genet Mol Res. 2007. PMID: 18058716
-
Practical analysis of specificity-determining residues in protein families.Brief Bioinform. 2016 Mar;17(2):255-61. doi: 10.1093/bib/bbv045. Epub 2015 Jul 2. Brief Bioinform. 2016. PMID: 26141829 Review.
Cited by
-
Clustering of protein domains for functional and evolutionary studies.BMC Bioinformatics. 2009 Oct 15;10:335. doi: 10.1186/1471-2105-10-335. BMC Bioinformatics. 2009. PMID: 19832975 Free PMC article.
-
Combining specificity determining and conserved residues improves functional site prediction.BMC Bioinformatics. 2009 Jun 9;10:174. doi: 10.1186/1471-2105-10-174. BMC Bioinformatics. 2009. PMID: 19508719 Free PMC article.
-
Characterization and prediction of residues determining protein functional specificity.Bioinformatics. 2008 Jul 1;24(13):1473-80. doi: 10.1093/bioinformatics/btn214. Epub 2008 May 1. Bioinformatics. 2008. PMID: 18450811 Free PMC article.
-
Ensemble approach to predict specificity determinants: benchmarking and validation.BMC Bioinformatics. 2009 Jul 2;10:207. doi: 10.1186/1471-2105-10-207. BMC Bioinformatics. 2009. PMID: 19573245 Free PMC article.
-
Determinants, discriminants, conserved residues--a heuristic approach to detection of functional divergence in protein families.PLoS One. 2011;6(9):e24382. doi: 10.1371/journal.pone.0024382. Epub 2011 Sep 12. PLoS One. 2011. PMID: 21931701 Free PMC article.
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources