Is multiple-sequence alignment required for accurate inference of phylogeny?
- PMID: 17454975
- PMCID: PMC7107264
- DOI: 10.1080/10635150701294741
Is multiple-sequence alignment required for accurate inference of phylogeny?
Abstract
The process of inferring phylogenetic trees from molecular sequences almost always starts with a multiple alignment of these sequences but can also be based on methods that do not involve multiple sequence alignment. Very little is known about the accuracy with which such alignment-free methods recover the correct phylogeny or about the potential for increasing their accuracy. We conducted a large-scale comparison of ten alignment-free methods, among them one new approach that does not calculate distances and a faster variant of our pattern-based approach; all distance-based alignment-free methods are freely available from http://www.bioinformatics.org.au (as Python package decaf+py). We show that most methods exhibit a higher overall reconstruction accuracy in the presence of high among-site rate variation. Under all conditions that we considered, variants of the pattern-based approach were significantly better than the other alignment-free methods. The new pattern-based variant achieved a speed-up of an order of magnitude in the distance calculation step, accompanied by a small loss of tree reconstruction accuracy. A method of Bayesian inference from k-mers did not improve on classical alignment-free (and distance-based) methods but may still offer other advantages due to its Bayesian nature. We found the optimal word length k of word-based methods to be stable across various data sets, and we provide parameter ranges for two different alphabets. The influence of these alphabets was analyzed to reveal a trade-off in reconstruction accuracy between long and short branches. We have mapped the phylogenetic accuracy for many alignment-free methods, among them several recently introduced ones, and increased our understanding of their behavior in response to biologically important parameters. In all experiments, the pattern-based approach emerged as superior, at the expense of higher resource consumption. Nonetheless, no alignment-free method that we examined recovers the correct phylogeny as accurately as does an approach based on maximum-likelihood distance estimates of multiply aligned sequences.
Figures


Similar articles
-
Bayesian and maximum likelihood phylogenetic analyses of protein sequence data under relative branch-length differences and model violation.BMC Evol Biol. 2005 Jan 28;5:8. doi: 10.1186/1471-2148-5-8. BMC Evol Biol. 2005. PMID: 15676079 Free PMC article.
-
Bayesian coestimation of phylogeny and sequence alignment.BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83. BMC Bioinformatics. 2005. PMID: 15804354 Free PMC article.
-
Pattern-based phylogenetic distance estimation and tree reconstruction.Evol Bioinform Online. 2007 Feb 25;2:359-75. Evol Bioinform Online. 2007. PMID: 19455227 Free PMC article.
-
Alignment-free phylogenetics and population genetics.Brief Bioinform. 2014 May;15(3):407-18. doi: 10.1093/bib/bbt083. Epub 2013 Nov 29. Brief Bioinform. 2014. PMID: 24291823 Review.
-
Pattern recognition and probabilistic measures in alignment-free sequence analysis.Brief Bioinform. 2014 May;15(3):354-68. doi: 10.1093/bib/bbt070. Epub 2013 Oct 3. Brief Bioinform. 2014. PMID: 24096012 Review.
Cited by
-
ITS2 Secondary Structure Improves Discrimination between Medicinal "Mu Tong" Species when Using DNA Barcoding.PLoS One. 2015 Jul 1;10(7):e0131185. doi: 10.1371/journal.pone.0131185. eCollection 2015. PLoS One. 2015. PMID: 26132382 Free PMC article.
-
Next-generation phylogenomics.Biol Direct. 2013 Jan 22;8:3. doi: 10.1186/1745-6150-8-3. Biol Direct. 2013. PMID: 23339707 Free PMC article.
-
An alignment-free method for phylogeny estimation using maximum likelihood.BMC Bioinformatics. 2025 Mar 7;26(1):77. doi: 10.1186/s12859-025-06080-w. BMC Bioinformatics. 2025. PMID: 40055594 Free PMC article.
-
Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs.BMC Bioinformatics. 2015 Apr 1;16:108. doi: 10.1186/s12859-015-0516-1. BMC Bioinformatics. 2015. PMID: 25888064 Free PMC article.
-
Quantifying the uncertainty of assembly-free genome-wide distance estimates and phylogenetic relationships using subsampling.Cell Syst. 2022 Oct 19;13(10):817-829.e3. doi: 10.1016/j.cels.2022.06.007. Cell Syst. 2022. PMID: 36265468 Free PMC article.
References
-
- Beiko R. G., Chan C. X., Ragan M. A. A word-oriented approach to alignment validation. Bioinformatics. 2005;21:2230–2239. - PubMed
-
- Beiko R. G., Keith J. M., Harlow T. J., Ragan M. A. Searching for convergence in phylogenetic Markov chain Monte Carlo. Syst. Biol. 2006;55:553–565. - PubMed
-
- Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 2000;17:540–552. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources