A word-oriented approach to alignment validation
- PMID: 15728118
- DOI: 10.1093/bioinformatics/bti335
A word-oriented approach to alignment validation
Abstract
Motivation: Multiple sequence alignment at the level of whole proteomes requires a high degree of automation, precluding the use of traditional validation methods such as manual curation. Since evolutionary models are too general to describe the history of each residue in a protein family, there is no single algorithm/model combination that can yield a biologically or evolutionarily optimal alignment. We propose a 'shotgun' strategy where many different algorithms are used to align the same family, and the best of these alignments is then chosen with a reliable objective function. We present WOOF, a novel 'word-oriented' objective function that relies on the identification and scoring of conserved amino acid patterns (words) between pairs of sequences.
Results: Tests on a subset of reference protein alignments from BAliBASE showed that WOOF tended to rank the (manually curated) reference alignment highest among 1060 alternative (automatically generated) alignments for a majority of protein families. Among the automated alignments, there was a strong positive relationship between the WOOF score and similarity to the reference alignment. The speed of WOOF and its independence from explicit considerations of three-dimensional structure make it an excellent tool for analyzing large numbers of protein families.
Availability: On request from the authors.
Similar articles
-
HMM-ModE--improved classification using profile hidden Markov models by optimising the discrimination threshold and modifying emission probabilities with negative training sequences.BMC Bioinformatics. 2007 Mar 27;8:104. doi: 10.1186/1471-2105-8-104. BMC Bioinformatics. 2007. PMID: 17389042 Free PMC article.
-
An iterative refinement algorithm for consistency based multiple structural alignment methods.Bioinformatics. 2006 Sep 1;22(17):2087-93. doi: 10.1093/bioinformatics/btl351. Epub 2006 Jun 29. Bioinformatics. 2006. PMID: 16809393
-
Detecting protein dissimilarities in multiple alignments using Bayesian variable selection.Bioinformatics. 2007 Jan 15;23(2):245-6. doi: 10.1093/bioinformatics/btl566. Epub 2006 Nov 14. Bioinformatics. 2007. PMID: 17105719
-
Where did the BLOSUM62 alignment score matrix come from?Nat Biotechnol. 2004 Aug;22(8):1035-6. doi: 10.1038/nbt0804-1035. Nat Biotechnol. 2004. PMID: 15286655 Review.
-
Scoring residue conservation.Proteins. 2002 Aug 1;48(2):227-41. doi: 10.1002/prot.10146. Proteins. 2002. PMID: 12112692 Review.
Cited by
-
Recurrent horizontal transfer of arsenite methyltransferase genes facilitated adaptation of life to arsenic.Sci Rep. 2017 Aug 10;7(1):7741. doi: 10.1038/s41598-017-08313-2. Sci Rep. 2017. PMID: 28798375 Free PMC article.
-
Lateral transfer of genes and gene fragments in Staphylococcus extends beyond mobile elements.J Bacteriol. 2011 Aug;193(15):3964-77. doi: 10.1128/JB.01524-10. Epub 2011 May 27. J Bacteriol. 2011. PMID: 21622749 Free PMC article.
-
LMAP_S: Lightweight Multigene Alignment and Phylogeny eStimation.BMC Bioinformatics. 2019 Dec 30;20(1):739. doi: 10.1186/s12859-019-3292-5. BMC Bioinformatics. 2019. PMID: 31888452 Free PMC article.
-
Are protein domains modules of lateral genetic transfer?PLoS One. 2009;4(2):e4524. doi: 10.1371/journal.pone.0004524. Epub 2009 Feb 20. PLoS One. 2009. PMID: 19229333 Free PMC article.
-
Is multiple-sequence alignment required for accurate inference of phylogeny?Syst Biol. 2007 Apr;56(2):206-21. doi: 10.1080/10635150701294741. Syst Biol. 2007. PMID: 17454975 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources