BSCAMPP: Batch-Scaled Phylogenetic Placement on Large Trees
- PMID: 40811324
- DOI: 10.1109/TCBBIO.2025.3562281
BSCAMPP: Batch-Scaled Phylogenetic Placement on Large Trees
Abstract
Phylogenetic placement is the problem of placing sequences into a given phylogenetic tree, called a "backbone tree". EPA-ng and pplacer are the two most accurate phylogenetic placement methods, but both can fail to complete when the backbone tree is very large. Our recently designed SCAMPP framework has been shown to scale both pplacer and EPA-ng to larger backbone trees of up to 180,000 sequences by building a small placement subtree for each query sequence and then using the phylogenetic placement method to place that query sequence into that subtree. However, the technique in SCAMPP produces many placement subtrees (potentially a different one for each query sequence), making it computationally expensive when placing many query sequences. Here we present BSCAMPP (Batch-SCAMPP), a new technique that overcomes this barrier by using the query sequences to select a much smaller number of placement subtrees. We show that BSCAMPP used with EPA-ng is much faster than SCAMPP used with EPA-ng, and scales to ultra-large backbone trees. We also show that BSCAMPP used with pplacer is much faster than SCAMPP used with pplacer, and somewhat more accurate but slower than BSCAMPP used with EPA-ng.
LinkOut - more resources
Research Materials