Performance, accuracy, and Web server for evolutionary placement of short sequence reads under maximum likelihood
- PMID: 21436105
- PMCID: PMC3078422
- DOI: 10.1093/sysbio/syr010
Performance, accuracy, and Web server for evolutionary placement of short sequence reads under maximum likelihood
Abstract
We present an evolutionary placement algorithm (EPA) and a Web server for the rapid assignment of sequence fragments (short reads) to edges of a given phylogenetic tree under the maximum-likelihood model. The accuracy of the algorithm is evaluated on several real-world data sets and compared with placement by pair-wise sequence comparison, using edit distances and BLAST. We introduce a slow and accurate as well as a fast and less accurate placement algorithm. For the slow algorithm, we develop additional heuristic techniques that yield almost the same run times as the fast version with only a small loss of accuracy. When those additional heuristics are employed, the run time of the more accurate algorithm is comparable with that of a simple BLAST search for data sets with a high number of short query sequences. Moreover, the accuracy of the EPA is significantly higher, in particular when the sample of taxa in the reference topology is sparse or inadequate. Our algorithm, which has been integrated into RAxML, therefore provides an equally fast but more accurate alternative to BLAST for tree-based inference of the evolutionary origin and composition of short sequence reads. We are also actively developing a Web server that offers a freely available service for computing read placements on trees using the EPA.
Figures









Similar articles
-
pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree.BMC Bioinformatics. 2010 Oct 30;11:538. doi: 10.1186/1471-2105-11-538. BMC Bioinformatics. 2010. PMID: 21034504 Free PMC article.
-
PHYML Online--a web server for fast maximum likelihood-based phylogenetic inference.Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W557-9. doi: 10.1093/nar/gki352. Nucleic Acids Res. 2005. PMID: 15980534 Free PMC article.
-
On the quality of tree-based protein classification.Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12. Bioinformatics. 2005. PMID: 15647305
-
SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1. Syst Biol. 2012. PMID: 22139466
-
A rapid bootstrap algorithm for the RAxML Web servers.Syst Biol. 2008 Oct;57(5):758-71. doi: 10.1080/10635150802429642. Syst Biol. 2008. PMID: 18853362
Cited by
-
The mean and variance of phylogenetic diversity under rarefaction.Methods Ecol Evol. 2013 Jun 1;4(6):566-572. doi: 10.1111/2041-210X.12042. Methods Ecol Evol. 2013. PMID: 23833701 Free PMC article.
-
Composition and activity of nitrifier communities in soil are unresponsive to elevated temperature and CO2, but strongly affected by drought.ISME J. 2020 Dec;14(12):3038-3053. doi: 10.1038/s41396-020-00735-7. Epub 2020 Aug 7. ISME J. 2020. PMID: 32770119 Free PMC article.
-
Osmunda pulchella sp. nov. from the Jurassic of Sweden--reconciling molecular and fossil evidence in the phylogeny of modern royal ferns (Osmundaceae).BMC Evol Biol. 2015 Jun 30;15:126. doi: 10.1186/s12862-015-0400-7. BMC Evol Biol. 2015. PMID: 26123220 Free PMC article.
-
HIV proviral genetic diversity, compartmentalization and inferred dynamics in lung and blood during long-term suppressive antiretroviral therapy.PLoS Pathog. 2022 Nov 4;18(11):e1010613. doi: 10.1371/journal.ppat.1010613. eCollection 2022 Nov. PLoS Pathog. 2022. PMID: 36331974 Free PMC article.
-
EPIK: precise and scalable evolutionary placement with informative k-mers.Bioinformatics. 2023 Dec 1;39(12):btad692. doi: 10.1093/bioinformatics/btad692. Bioinformatics. 2023. PMID: 37975872 Free PMC article.
References
-
- Ababneh F, Jermiin LS, Ma C, Robinson J. Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics. 2006;22:1225–1231. - PubMed
-
- Berger SA, Stamatakis A. Accuracy of morphology-based phylogenetic fossil placement under maximum likelihood. Proceedings of IEEE/ACS International Conference on Computer Systems and Applications (AICCSA-10); 2010 May 16–18; Hammamet, Tunisia: IEEE Computer Society. p. 1–8. 2010
-
- Bininda-Emonds ORP, Brady SG, Sanderson MJ, Kim J. Scaling of accuracy in extremely large phylogenetic trees. In: Altman RB, Dunker AK, Hunter L, Lauderdale K, Klein TE, editors. Pacific Symposium on Biocomputing 2001. River Edge (NJ): World Scientific; 2000. pp. 547–558. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Molecular Biology Databases
Research Materials