Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 24;38(Suppl 1):i413-i421.
doi: 10.1093/bioinformatics/btac265.

QuCo: quartet-based co-estimation of species trees and gene trees

Affiliations

QuCo: quartet-based co-estimation of species trees and gene trees

Maryam Rabiee et al. Bioinformatics. .

Abstract

Motivation: Phylogenomics faces a dilemma: on the one hand, most accurate species and gene tree estimation methods are those that co-estimate them; on the other hand, these co-estimation methods do not scale to moderately large numbers of species. The summary-based methods, which first infer gene trees independently and then combine them, are much more scalable but are prone to gene tree estimation error, which is inevitable when inferring trees from limited-length data. Gene tree estimation error is not just random noise and can create biases such as long-branch attraction.

Results: We introduce a scalable likelihood-based approach to co-estimation under the multi-species coalescent model. The method, called quartet co-estimation (QuCo), takes as input independently inferred distributions over gene trees and computes the most likely species tree topology and internal branch length for each quartet, marginalizing over gene tree topologies and ignoring branch lengths by making several simplifying assumptions. It then updates the gene tree posterior probabilities based on the species tree. The focus on gene tree topologies and the heuristic division to quartets enables fast likelihood calculations. We benchmark our method with extensive simulations for quartet trees in zones known to produce biased species trees and further with larger trees. We also run QuCo on a biological dataset of bees. Our results show better accuracy than the summary-based approach ASTRAL run on estimated gene trees.

Availability and implementation: QuCo is available on https://github.com/maryamrabiee/quco.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Felsenstein’s zone simulation. (a) Each gene tree branch length is scaled by μs and/or μl; for example, the length of the terminal branch of C becomes μlτ1+μsτ2. Rates μs and μl are selected such that terminal branches of A and C in the unrooted gene tree have expected length l, and other branches have expected length s, as shown. (b) MAP gene trees estimated using MrBayes with simulations in Felsenstein’s zone can have large estimation error, especially when l/s is high and sequence lengths (boxes) are short
Fig. 2.
Fig. 2.
Felsenstein’s zone quartet simulations comparing QuCo to ASTRAL (a) or Bucky-Quartet (b). Each box shows a combination of long l (rows) and short s branch lengths (columns), and colors delineate ILS level controlled by d. Each ribbon shows the improvement of QuCo over ASTRAL or BUCKy, all run on MrBayes gene trees. When the ribbon is patterned, BUCKy is better than QuCo
Fig. 3.
Fig. 3.
Branch length accuracy on Felsenstein’s zone simulations, showing the distribution of estimated branch length divided by true branch length for correctly estimated species tree (the number of such cases shown in each case). Lines show the four quartiles and the dot shows the mean. Each box corresponds to a value of s, combining all l values. See Supplementary Figure S5 for better resolution
Fig. 4.
Fig. 4.
Gene tree estimation error on Felsenstein’s zone simulations. Each dot corresponds to one model condition, with the x-axis showing the improvement in species tree accuracy by QuCo compared to ASTRAL and the y-axis showing the improvement in the average gene tree accuracy for all genes. The size of dots corresponds to the accuracy of ASTRAL species trees
Fig. 5.
Fig. 5.
Species tree topological error (mean, standard error) under anomaly zone simulations versus the number of genes
Fig. 6.
Fig. 6.
30-taxon dataset. Left: Comparison of the error rate of the species tree generated by running ASTRAL on IQ-Tree ML gene trees and QuCo on IQ-Tree quartet likelihoods with 200 and 500 genes of 30-taxon dataset. The x-axis shows deviation from clock represented by parameter α (inverse of the variance of rate multipliers). Each box is over 50 replicates

Similar articles

References

    1. Allman E.S. et al. (2011) Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent. J. Math. Biol., 62, 833–862. - PubMed
    1. Ané C. et al. (2007) Bayesian estimation of concordance among gene trees. Mol. Biol. Evol., 24, 412–426. - PubMed
    1. Avni E. et al. (2015) Weighted quartets phylogenetics. Syst. Biol., 64, 233–242. - PubMed
    1. Bayzid M.S., Warnow T. (2013) Naive binning improves phylogenomic analyses. Bioinformatics, 29, 2277–2284. - PubMed
    1. Bossert S. et al. (2021) Gene tree estimation error with ultraconserved elements: an empirical study on Pseudapis bees. Syst. Biol., 70, 803–821. - PubMed

Publication types