Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 24;8(2):312.
doi: 10.3390/microorganisms8020312.

A Systematic Approach to Bacterial Phylogeny Using Order Level Sampling and Identification of HGT Using Network Science

Affiliations

A Systematic Approach to Bacterial Phylogeny Using Order Level Sampling and Identification of HGT Using Network Science

Ehdieh Khaledian et al. Microorganisms. .

Abstract

Reconstructing and visualizing phylogenetic relationships among living organisms is a fundamental challenge because not all organisms share the same genes. As a result, the first phylogenetic visualizations employed a single gene, e.g., rRNA genes, sufficiently conserved to be present in all organisms but divergent enough to provide discrimination between groups. As more genome data became available, researchers began concatenating different combinations of genes or proteins to construct phylogenetic trees believed to be more robust because they incorporated more information. However, the genes or proteins chosen were based on ad hoc approaches. The large number of complete genome sequences available today allows the use of whole genomes to analyze relationships among organisms rather than using an ad hoc set of genes. We present a systematic approach for constructing a phylogenetic tree based on simultaneously clustering the complete proteomes of 360 bacterial species. From the homologous clusters, we identify 49 protein sequences shared by 99% of the organisms to build a tree. Of the 49 sequences, 47 have homologous sequences in both archaea and eukarya. The clusters are also used to create a network from which bacterial species with horizontally-transferred genes from other phyla are identified.

Keywords: horizontal gene transfer; network of bacteria; network science; phylogeny; tree of bacterial phyla.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
A heatmap of protein sequence cluster membership together with hierarchical clustering results for the 360 organisms. The heatmap assists in understanding the cluster results. Each row indicates a cluster, and each column represents an organism. Cyan represents membership in a protein cluster while blue indicates absence. The legend of phyla at the top of the figure shows the largest clades represented by colored bands immediately above the heatmap. On the left side of the dendrogram we find Actinobacteria, with distinct blocks of clusters indicated by cyan (A and B in the figure). Bacteroidetes (black with yellow lines) also has blocks of protein clusters (C in the figure). The yellow arrow indicates the band of protein clusters broadly conserved across the phyla.
Figure 2
Figure 2
A network of organisms assembled using homologous clusters obtained from the proteomes deduced from 360 complete bacterial genomes and retaining only the top 20% of the links between organisms (shared protein sequences). Phyla are assigned arbitrary colors. Nodes/Organisms are labeled using the first four letters of their particular phylum followed by their order. The network is created from all proteins shared by non-singleton (two or more sequences from different organisms) clusters. Therefore, it reflects the relationships between organisms both vertically and horizontally. In the network, organisms belonging to the same phylum often group together, e.g., Chloroflexi (seafoam green), Actinobacteria (red), Bacteroidetes/Chlorobi (light blue), and Cyanobacteria (light purple). The central section of the network has many organisms that are strongly linked to each other indicating that they share many homologous protein sequences. The Firmicutes (yellow) have a grouping that extends from the middle of the network, but are also distributed throughout this section. The nodes with larger circles indicate organisms isolated from their respective phyla; as discussed in the text, these organisms have numerous horizontally-transferred genes.
Figure 3
Figure 3
Distance tree for genes horizontally transferred to M. infera. The distance tree for MESINF_1317 was searched against the NCBI NR database using BLASTp. It indicates that MESINF_1317 is a Beta glucosidase-like glycosyl hydrolase protein potentially transferred from an organism of the order Bacillales, phylum Firmicutes. For a greater portion of the horizontally-transferred genes (55%), protein function is unknown, and inferring origin of the gene can be helpful in predicting protein function.
Figure 4
Figure 4
Distance tree for genes horizontally transferred to M. infera. The distance tree for MESINF_0680 indicates that it may be a glycosyltransferase family protein transferred from Paenibacillus sp. FJAT-27812.
Figure 5
Figure 5
A maptree of the distribution of donor phyla for the horizontally-transferred genes of M. infera. The genes are mainly from the orders Bacillales and Clostridiales in the phylum Firmicutes, followed by Gammaproteobacteria, Bacteroidetes, Chloroflexi, Alphaproteobacteria, and Archaea.
Figure 6
Figure 6
A high-resolution tree of bacterial phyla created from 49 proteins shared by ≥99% of the 360 bacterial organisms in our study. Each phylum is specified using a distinct color. The major bacterial lineages based on orders are Actinobacteria and Gammaproteobacteria, followed by Firmicutes, Alphaproteobacteria, and Bacteroidetes. Proteobacteria classes, excluding the delta/epsilon subdivision, appear on the same branch. A dendrogram of the tree including the complete organism names and phyla is available in Supplementary Figure S5.
Figure 7
Figure 7
A Venn diagram depicting how genes from multiple organisms cluster: Genes that are shared by all organisms, genes that are shared by more than one organism, and genes that are specific to one organism.

Similar articles

Cited by

References

    1. Lockwood S., Brayton K.A., Broschat S.L. Comparative genomics reveals multiple pathways to mutualism for tick-borne pathogens. BMC Genom. 2016;17:481. doi: 10.1186/s12864-016-2744-9. - DOI - PMC - PubMed
    1. Easley D., Kleinberg J. Networks, Crowds, and Markets. Volume 8 Cambridge University Press; Cambridge, UK: 2010.
    1. Hiramatsu K., Cui L., Kuroda M., Ito T. The emergence and evolution of methicillin-resistant Staphylococcus aureus. Trends Microbiol. 2001;9:486–493. doi: 10.1016/S0966-842X(01)02175-8. - DOI - PubMed
    1. Merhej V., Notredame C., Royer-Carenzi M., Pontarotti P., Raoult D. The rhizome of life: The sympatric Rickettsia felis paradigm demonstrates the random transfer of DNA sequences. Mol. Biol. Evol. 2011;28:3213–3223. doi: 10.1093/molbev/msr239. - DOI - PubMed
    1. Parte A. LPSN–list of prokaryotic names with standing in nomenclature. Nucleic Acids Res. 2014;42:D613–D616. doi: 10.1093/nar/gkt1111. - DOI - PMC - PubMed