Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 3;10(1):1723.
doi: 10.1038/s41598-020-58356-1.

Standardized phylogenetic and molecular evolutionary analysis applied to species across the microbial tree of life

Affiliations

Standardized phylogenetic and molecular evolutionary analysis applied to species across the microbial tree of life

Migun Shakya et al. Sci Rep. .

Abstract

There is growing interest in reconstructing phylogenies from the copious amounts of genome sequencing projects that target related viral, bacterial or eukaryotic organisms. To facilitate the construction of standardized and robust phylogenies for disparate types of projects, we have developed a complete bioinformatic workflow, with a web-based component to perform phylogenetic and molecular evolutionary (PhaME) analysis from sequencing reads, draft assemblies or completed genomes of closely related organisms. Furthermore, the ability to incorporate raw data, including some metagenomic samples containing a target organism (e.g. from clinical samples with suspected infectious agents), shows promise for the rapid phylogenetic characterization of organisms within complex samples without the need for prior assembly.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
PhaME analysis workflow. The PhaME analysis workflow first identifies SNPs at orthologous positions in complete genomes, assembled contigs, and read datasets. First, nucmer is used to identify and mask repeats, and to perform pairwise alignments among all complete genomes. A reference genome is selected based on user criteria (See Methods). Contigs are then compared with the reference genome using nucmer, and reads are then mapped to the reference using Bowtie 2 or BWA. The SNP and gap coordinates are used to generate whole-genome core alignment. If an annotation file is provided, a separate alignment consisting of conserved positions only found in the CDS regions are also reported. RAxML, FastTree or IQ-TREE phylogenies are constructed using these alignments. If specified, PAML or HyPhy packages are used to test for selective pressure on genes with SNPs.
Figure 2
Figure 2
SNP based phylogeny of 35 Escherichia and Shigella genomes. All nodes have bipartition bootstrap support of 60% or greater. Clades are labeled with their corresponding E. coli phylogroups on the right. The tree was rooted with E. fergusonii ATCC 35469 as an outgroup that was removed in the figure. The scale bar indicates the number of substitutions per site.
Figure 3
Figure 3
Inter-genus phylogeny using 676 Escherichia, Shigella, Salmonella, Shimwellia, and Atlantibacter datasets. Branches containing genomes from clades representing E. coli phylotypes and species with multiple strains are collapsed and labeled on the right with their corresponding phylotypes or species name. Genomes that did not form clades with any phylotypes are labeled with their full name. Genomes of cryptic Escherichia clades have their groups labeled in parenthesis from CI-CV. Two forward slashes in branches represents branches that were trimmed and the corresponding numbers represent the actual branch lengths. The tree was rooted with outgroup Shimwellia spp. The scale bar indicates the number of substitutions per site. A detailed tree that displays the names of all genomes and support values is shown in Fig. S1.
Figure 4
Figure 4
Phylogeny of Burkholderia, Paraburkholderia, Caballeronia, and Ralstonia using reads, contigs, and finished genomes. Maximum likelihood phylogeny from 213 samples (genomes, assemblies, and reads). Clades of the same species were collapsed and only the name of that species is shown. Ralstonia solanacearum PSI07 was used as an outgroup. The scale bar indicates the number of substitutions per site. A fully expanded and detailed tree can be found in Fig. S2. Detailed trees showing relationships among genomes of only the Bcc or within the B. pseudomallei/mallei group can be found in Figs. S3 and S4 respectively.
Figure 5
Figure 5
Read-based PhaME phylogenetic analysis of two human fecal metagenomics samples. Maximum likelihood tree showing 53 E. coli and Shigella genomes and the placement within the tree of the dominant E. coli present in the two metagenomes. The tree was rooted using outgroup E. fergusonii ATCC 35469. Nodes with bipartition bootstrap ≥60% are labeled with circles. The scale bar indicates the number of substitutions per site. The bar graph on the right shows the percentage of reads that mapped to each genome from the two metagenomic samples. Names of genomes are colored based on their phylotype association similar to Fig. 2.

Similar articles

Cited by

References

    1. Lee TH, Guo H, Wang X, Kim C, Paterson AH. SNPhylo: a pipeline to construct a phylogenetic tree from huge SNP data. BMC Genomics. 2014;15:162. doi: 10.1186/1471-2164-15-162. - DOI - PMC - PubMed
    1. Faison WJ, et al. Whole genome single-nucleotide variation profile-based phylogenetic tree building methods for analysis of viral, bacterial and human genomes. Genomics. 2014;104:1–7. doi: 10.1016/j.ygeno.2014.06.001. - DOI - PubMed
    1. McNally KL, et al. Genomewide SNP variation reveals relationships among landraces and modern varieties of rice. Proc. Natl Acad. Sci. USA. 2009;106:12273–12278. doi: 10.1073/pnas.0900992106. - DOI - PMC - PubMed
    1. Sankarasubramanian J, Vishnu US, Gunasekaran P, Rajendhran J. A genome-wide SNP-based phylogenetic analysis distinguishes different biovars of Brucella suis. Infect. Genet. Evol. 2016;41:213–217. doi: 10.1016/j.meegid.2016.04.012. - DOI - PubMed
    1. Pamilo P, Nei M. Relationships between gene trees and species trees. Mol. Biol. Evol. 1988;5:568–583. - PubMed

Publication types

LinkOut - more resources