Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 13;122(19):e2500553122.
doi: 10.1073/pnas.2500553122. Epub 2025 May 2.

Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES

Affiliations

Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES

Anshu Gupta et al. Proc Natl Acad Sci U S A. .

Abstract

Current genome sequencing initiatives across a wide range of life forms offer significant potential to enhance our understanding of evolutionary relationships and support transformative biological and medical applications. Species trees play a central role in many of these applications; however, despite the widespread availability of genome assemblies, accurate inference of species trees remains challenging due to the limited automation, substantial domain expertise, and computational resources required by conventional methods. To address this limitation, we present ROADIES, a fully automated pipeline to infer species trees starting from raw genome assemblies. In contrast to the prominent approach, ROADIES incorporates a unique strategy of randomly sampling segments of the input genomes to generate gene trees. This eliminates the need for predefining a set of loci, limiting the analyses to a fixed number of genes, and performing the cumbersome gene annotation and/or whole genome alignment steps. ROADIES also eliminates the need to infer orthology by leveraging existing discordance-aware methods that allow multicopy genes. Using the genomic datasets from large-scale sequencing efforts across four diverse life forms (placental mammals, pomace flies, birds, and budding yeasts), we show that ROADIES infers species trees that are comparable in quality to the state-of-the-art studies but in a fraction of the time and effort, including on challenging datasets with rampant gene tree discordance and complex polyploidy. With its speed, accuracy, and automation, ROADIES has the potential to vastly simplify species tree inference, making it accessible to a broader range of scientists and applications.

Keywords: bioinformatics; phylogenetics; species tree inference.

PubMed Disclaimer

Conflict of interest statement

Competing interests statement:The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
An overview of the ROADIES pipeline. (A) ROADIES input and output (B) A comparison of the different steps involved in the species tree inference in the conventional approaches and ROADIES. (C) A detailed view of the ROADIES pipeline’s various stages and the convergence mechanism.
Fig. 2.
Fig. 2.
ROADIES results evaluated on the dataset of 240 placental mammals (in the accurate mode). (A) The species-level phylogenetic tree of 240 placental mammals estimated by ROADIES. The number of genes aligned to each species (blue) and the count of genes sampled from each species (green) (8) are also shown. (B) Order-level trees of 240 placental mammals estimated by ROADIES (on the Right) and the reference tree from the Zoonomia consortium (8) (on the Left). Dashed branches show the differences between the two trees. (C) ROADIES convergence in accurate mode. As the number of gene trees increases, we show the percentage of highly supported species tree nodes with localPP ≧ 0.95 (plot i), the linear increase in runtime (ii), and the normRF of the final species tree to the reference tree (iii). (D) Quartet scores i) and localPP branch support ii) of all three topologies around three branches (marked in B), which had low support in the final tree.
Fig. 3.
Fig. 3.
ROADIES results evaluated on the dataset of (A and B) 100 drosophilid species, (C and D) 363 aves, and (E and F) 332 budding yeasts (in the accurate mode). (A, C, and E) The species-level phylogenetic tree of (A) 100 drosophilid species, (C) 363 avian species, and (E) 332 budding yeasts estimated by ROADIES. All trees were estimated in the accurate mode of ROADIES, with the budding yeast tree estimated with deep setting additionally enabled. The number of genes aligned to each species (blue) and the count of genes sampled from each species (green) are also shown. (B, D, and F) Cophylogenetic plots comparing the reference tree (on the Left) with the tree estimated by ROADIES (on the Right) shown at (B) group-level for 100 drosophilid species, (D) order-level for 363 avian species, and (F) clade-level for 332 budding yeasts. Reference and ROADIES trees match exactly at the group-level for drosophilid species. Dashed branches in the ROADIES trees show the differences with the reference trees in the remaining two cases (D and F).

Update of

Similar articles

Cited by

  • Poplar: a phylogenomics pipeline.
    Koning E, Subedi A, Krishnakumar R. Koning E, et al. Bioinform Adv. 2025 May 6;5(1):vbaf104. doi: 10.1093/bioadv/vbaf104. eCollection 2025. Bioinform Adv. 2025. PMID: 40510372 Free PMC article.

References

    1. Cheng S., et al. , 10KP: A phylodiverse genome sequencing plan. GigaScience 7, giy013 (2018). - PMC - PubMed
    1. Genome 10K Community of Scientists, Genome 10K: A proposal to obtain whole-genome sequence for 10000 vertebrate species. J. Hered. 100, 659–674 (2009). - PMC - PubMed
    1. Rhie A., et al. , Towards complete and error-free genome assemblies of all vertebrate species. Nature 592, 737–746 (2021). - PMC - PubMed
    1. Foley N. M., et al. , A genomic timescale for placental mammal evolution. Science 380, eabl8189 (2023). - PMC - PubMed
    1. Stiller J., et al. , Complexity of avian evolution revealed by family-level genomes. Nature 629, 851–860 (2024), 10.1038/s41586-024-07323-1. - DOI - PMC - PubMed

LinkOut - more resources