Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr 25;8(4):e62510.
doi: 10.1371/journal.pone.0062510. Print 2013.

Phylogeny of bacterial and archaeal genomes using conserved genes: supertrees and supermatrices

Affiliations

Phylogeny of bacterial and archaeal genomes using conserved genes: supertrees and supermatrices

Jenna Morgan Lang et al. PLoS One. .

Abstract

Over 3000 microbial (bacterial and archaeal) genomes have been made publically available to date, providing an unprecedented opportunity to examine evolutionary genomic trends and offering valuable reference data for a variety of other studies such as metagenomics. The utility of these genome sequences is greatly enhanced when we have an understanding of how they are phylogenetically related to each other. Therefore, we here describe our efforts to reconstruct the phylogeny of all available bacterial and archaeal genomes. We identified 24, single-copy, ubiquitous genes suitable for this phylogenetic analysis. We used two approaches to combine the data for the 24 genes. First, we concatenated alignments of all genes into a single alignment from which a Maximum Likelihood (ML) tree was inferred using RAxML. Second, we used a relatively new approach to combining gene data, Bayesian Concordance Analysis (BCA), as implemented in the BUCKy software, in which the results of 24 single-gene phylogenetic analyses are used to generate a "primary concordance" tree. A comparison of the concatenated ML tree and the primary concordance (BUCKy) tree reveals that the two approaches give similar results, relative to a phylogenetic tree inferred from the 16S rRNA gene. After comparing the results and the methods used, we conclude that the current best approach for generating a single phylogenetic tree, suitable for use as a reference phylogeny for comparative analyses, is to perform a maximum likelihood analysis of a concatenated alignment of conserved, single-copy genes.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Concatenated Maximum Likelihood tree.
Phylogenetic tree inferred from a concatenated, partitioned alignment of 24 genes using RAxML. The branches of phyla with at least 5 representatives are colored, other lineages are all drawn with black lines. Support values are calculated from 100 rapid bootstrap replicates. This representation is a radial cladogram, in which branch length is not proportional to time, and some branches may be elongated so that the names of the taxa appear on the circumference of the circle. The original version of this figure is available in the Supporting Information: Figure S7.
Figure 2
Figure 2. 16S rRNA Maximum Likelihood tree.
Phylogenetic tree inferred from an alignment of the 16S rRNA gene using RAxML. The branches of phyla with at least 5 representatives are colored, other lineages are all drawn with black lines. Support values are calculated from 100 bootstrap replicates. This representation is a radial cladogram, in which branch length is not proportional to time, and some branches may be elongated so that the names of the taxa appear on the circumference of the circle. The original version of this figure is available in the Supporting Information: Figure S8.
Figure 3
Figure 3. Frequencies of support values observed in phylogenetic trees.
Histograms showing the frequency of support (bootstrap or concordance factor) values in the (A) best ML tree inferred from a concatenated alignment of 24 genes, (B) best ML trees for all of the 24 individual genes, (C) best ML tree inferred from the 16S rRNA gene, and (D) primary concordance (“BUCKy”) tree.
Figure 4
Figure 4. BUCKy tree.
Primary concordance (“BUCKy”) tree constructed using Bayesian Concordance Analysis of RAxML bootstrap replicates for each of the 24 phylogenetic marker genes. Values at the nodes are concordance factors. The branches of phyla with at least 5 representatives are colored, other lineages are all drawn with black lines. This representation is a radial cladogram, in which branch length is not proportional to time, and some branches may be elongated so that the names of the taxa appear on the circumference of the circle. The original version of this figure is available in the Supporting Information: Figure S9.
Figure 5
Figure 5. A majority-rule consensus tree calculated from the bootstrap replicates of one of the 24 genes.
Majority-rule consensus tree computed from the bootstrap replicates for the 30S ribosomal protein S3. This gene has an alignment length of 180 sites, which is the average length for the 24 marker genes used in this study. This representation is a radial cladogram, in which branch length is not proportional to time, and some branches may be elongated so that the taxa appear on the circumference of the circle. The original version of this figure is available in the Supporting Information: Figure S10.
Figure 6
Figure 6. Robinson-Foulds distances between trees.
Violin plot depicting the distribution of Robinson-Foulds (RF) distance measures among all pairwise comparisons between bootstrap replicates and between the best 24 single-gene maximum likelihood trees produced by RAxML. Points are plotted on the graph to show the RF values for pairwise comparisons between the concatenated ML tree vs. the BUCKy tree, between the BUCKy tree and the 16S rRNA tree, and between the concatenated ML tree and the 16S rRNA tree.
Figure 7
Figure 7. Correlation of gene alignment length and average amino acid identity with variance among bootstrap replicates.
Scatter plot showing the negative correlation of alignment length vs. average Robinson-Foulds (RF) distance among bootstrap replicates for each of the 24 genes. The average percent identity of each alignment is not correlated with the average RF distance among bootstrap replicates for each of the 24 genes.

Similar articles

Cited by

References

    1. Felsenstein J (2004) Inferring Phylogenies: Sinauer Associates: Sunderland, MA.
    1. Darwin C (1859) On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life: 1 edition. London: John Murray. - PMC - PubMed
    1. Fitch WM, Margoliash E (1967) Construction of phylogenetic trees. Science 155: 279–284. - PubMed
    1. Hilario E, Gogarten JP (1993) Horizontal transfer of ATPase genes–the tree of life becomes a net of life. Biosystems 31: 111–119. - PubMed
    1. Bapteste E, Susko E, Leigh J, MacLeod D, Charlebois RL, et al. (2005) Do orthologous gene phylogenies really support tree-thinking? BMC Evol Biol 5: 33. - PMC - PubMed

Publication types

Substances