Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 1;25(3):786-800.
doi: 10.1093/biostatistics/kxad025.

Analyzing microbial evolution through gene and genome phylogenies

Affiliations

Analyzing microbial evolution through gene and genome phylogenies

Sarah Teichman et al. Biostatistics. .

Erratum in

  • Correction.
    [No authors listed] [No authors listed] Biostatistics. 2024 Dec 31;26(1):kxae029. doi: 10.1093/biostatistics/kxae029. Biostatistics. 2024. PMID: 39186534 Free PMC article. No abstract available.

Abstract

Microbiome scientists critically need modern tools to explore and analyze microbial evolution. Often this involves studying the evolution of microbial genomes as a whole. However, different genes in a single genome can be subject to different evolutionary pressures, which can result in distinct gene-level evolutionary histories. To address this challenge, we propose to treat estimated gene-level phylogenies as data objects, and present an interactive method for the analysis of a collection of gene phylogenies. We use a local linear approximation of phylogenetic tree space to visualize estimated gene trees as points in low-dimensional Euclidean space, and address important practical limitations of existing related approaches, allowing an intuitive visualization of complex data objects. We demonstrate the utility of our proposed approach through microbial data analyses, including by identifying outlying gene histories in strains of Prevotella, and by contrasting Streptococcus phylogenies estimated using different gene sets. Our method is available as an open-source R package, and assists with estimating, visualizing, and interacting with a collection of bacterial gene phylogenies.

Keywords: Dimension reduction; Microbiome; Non-Euclidean; Statistical genetics; Visualization.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Fig. 1
Fig. 1
A screenshot of the interactive tool. A scatterplot showing the relationships between a collection of trees (left) can be shown alongside a selected individual gene tree (or collection of individual gene trees) (right). Additional gene-level variables such as functional annotation can also be visualized.
Fig. 2
Fig. 2
(Top panel) The geodesic path between T1 and T4 passes through T2 and T3. (Bottom left panel) A representation of the path between T1 and T4 in T6 and (bottom right panel) a mapping of T6 around T1 in R3 via the modified log map. Each binary tree topology corresponds to a single non-negative orthant in R3 and the orthants are joined along axes corresponding to common branches.
Fig. 3
Fig. 3
The proposed visualization of 63 gene trees constructed from 78 genomes from the Prevotella genus, depicted by a two-dimensional scatterplot. Three visibly outlying gene trees are labeled. A phylogenomic tree constructed from the full gene set is shown in red and a phylogenomic tree constructed from a reduced gene set after removing the three outlying genes is shown in green.
Fig. 4
Fig. 4
The estimated gene trees for the three outlying Prevotella genes identified in Figure 3, as well as the estimated phylogenomic tree (d). All trees are rooted at their mid-point. The DMRL_synthase (a) and GTP_cyclohydroI (b) gene trees both include one long branch leading to tip 51. The BacA (c) gene tree includes one long branch separating two clades of tips. No strikingly long branches are present in the phylogenomic tree.
Fig. 5
Fig. 5
The proposed visualization of 196 gene trees estimated from 106 genomes in the Streptococcus genus shown as a two-dimensional scatterplot (a). The same visualization is shown after rescaling the trees (b), where the rescaling is performed by dividing all branch lengths on each tree by the sum of the branch lengths for that tree. A phylogenomic tree constructed from the full gene set is shown as a black triangle and a phylogenomic tree constructed from a subset of ribosomal genes is shown as a black square. Gene trees are colored by whether or not they are ribosomal genes.

Update of

References

    1. Amenta N. and Klingner J. (2002). Case study: Visualizing sets of evolutionary trees. In: IEEE Symposium on Information Visualization, INFOVIS 2002. IEEE, pp. 71–74.
    1. Asnicar F., Thomas A. M., Beghini F., Mengoni C., Manara S., Manghi P., Zhu Q., Bolzan M., Cumbo F., May U.. and others (2020). Precise phylogenetic analysis of microbial isolates and genomes from metagenomes using PhyloPhlAn 3.0. Nature Communications 11. - PMC - PubMed
    1. Bapteste E., O’Malley M. A., Beiko R. G., Ereshefsky M., Gogarten J. P., Franklin-Hall L., Lapointe F.-J., Dupré J., Dagan T., Boucher Y.. and other (2009). Prokaryotic evolution and the tree of life are two different things. Biology Direct 4. - PMC - PubMed
    1. Barden D., Le H. and Owen M. (2018). Limiting behaviour of Fréchet means in the space of phylogenetic trees. Annals of the Institute of Statistical Mathematics 70, 99–129.
    1. Billera L. J., Holmes S. P. and Vogtmann K. (2001). Geometry of the space of phylogenetic trees. Advances in Applied Mathematics 27, 733–767.