Optimizing the size of the sequence profiles to increase the accuracy of protein sequence alignments generated by profile-profile algorithms
- PMID: 18337259
- DOI: 10.1093/bioinformatics/btn097
Optimizing the size of the sequence profiles to increase the accuracy of protein sequence alignments generated by profile-profile algorithms
Abstract
Motivation: Profile-based protein homology detection algorithms are valuable tools in genome annotation and protein classification. By utilizing information present in the sequences of homologous proteins, profile-based methods are often able to detect extremely weak relationships between protein sequences, as evidenced by the large-scale benchmarking experiments such as CASP and LiveBench.
Results: We study the relationship between the sensitivity of a profile-profile method and the size of the sequence profile, which is defined as the average number of different residue types observed at the profile's positions. We also demonstrate that improvements in the sensitivity of a profile-profile method can be made by incorporating a profile-dependent scoring scheme, such as position-specific background frequencies. The techniques presented in this article are implemented in an alignment algorithm UNI-FOLD. When tested against other well-established methods for fold recognition, UNI-FOLD shows increased sensitivity and specificity in detecting remote relationships between protein sequences.
Availability: UNI-FOLD web server can be accessed at http://blackhawk.cs.uni.edu
Similar articles
-
A comparison of scoring functions for protein sequence profile alignment.Bioinformatics. 2004 May 22;20(8):1301-8. doi: 10.1093/bioinformatics/bth090. Epub 2004 Feb 12. Bioinformatics. 2004. PMID: 14962936
-
PROMALS: towards accurate multiple sequence alignments of distantly related proteins.Bioinformatics. 2007 Apr 1;23(7):802-8. doi: 10.1093/bioinformatics/btm017. Epub 2007 Jan 31. Bioinformatics. 2007. PMID: 17267437
-
Quasi-consensus-based comparison of profile hidden Markov models for protein sequences.Bioinformatics. 2005 May 15;21(10):2287-93. doi: 10.1093/bioinformatics/bti374. Epub 2005 Mar 29. Bioinformatics. 2005. PMID: 15797916
-
The WWWH of remote homolog detection: the state of the art.Brief Bioinform. 2007 Mar;8(2):78-87. doi: 10.1093/bib/bbl032. Epub 2006 Sep 26. Brief Bioinform. 2007. PMID: 17003074 Review.
-
DNA sequence analysis linguistic tools: contrast vocabularies, compositional spectra and linguistic complexity.Appl Bioinformatics. 2003;2(2):103-12. Appl Bioinformatics. 2003. PMID: 15130826 Review.
Cited by
-
Incorporation of local structural preference potential improves fold recognition.PLoS One. 2011 Feb 18;6(2):e17215. doi: 10.1371/journal.pone.0017215. PLoS One. 2011. PMID: 21365008 Free PMC article.
-
Island method for estimating the statistical significance of profile-profile alignment scores.BMC Bioinformatics. 2009 Apr 20;10:112. doi: 10.1186/1471-2105-10-112. BMC Bioinformatics. 2009. PMID: 19379500 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources