Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Feb 27;10(2):e1003454.
doi: 10.1371/journal.pcbi.1003454. eCollection 2014 Feb.

tRNA signatures reveal a polyphyletic origin of SAR11 strains among alphaproteobacteria

Affiliations

tRNA signatures reveal a polyphyletic origin of SAR11 strains among alphaproteobacteria

Katherine C H Amrine et al. PLoS Comput Biol. .

Abstract

Molecular phylogenetics and phylogenomics are subject to noise from horizontal gene transfer (HGT) and bias from convergence in macromolecular compositions. Extensive variation in size, structure and base composition of alphaproteobacterial genomes has complicated their phylogenomics, sparking controversy over the origins and closest relatives of the SAR11 strains. SAR11 are highly abundant, cosmopolitan aquatic Alphaproteobacteria with streamlined, A+T-biased genomes. A dominant view holds that SAR11 are monophyletic and related to both Rickettsiales and the ancestor of mitochondria. Other studies dispute this, finding evidence of a polyphyletic origin of SAR11 with most strains distantly related to Rickettsiales. Although careful evolutionary modeling can reduce bias and noise in phylogenomic inference, entirely different approaches may be useful to extract robust phylogenetic signals from genomes. Here we develop simple phyloclassifiers from bioinformatically derived tRNA Class-Informative Features (CIFs), features predicted to target tRNAs for specific interactions within the tRNA interaction network. Our tRNA CIF-based model robustly and accurately classifies alphaproteobacterial genomes into one of seven undisputed monophyletic orders or families, despite great variability in tRNA gene complement sizes and base compositions. Our model robustly rejects monophyly of SAR11, classifying all but one strain as Rhizobiales with strong statistical support. Yet remarkably, conventional phylogenetic analysis of tRNAs classifies all SAR11 strains identically as Rickettsiales. We attribute this discrepancy to convergence of SAR11 and Rickettsiales tRNA base compositions. Thus, tRNA CIFs appear more robust to compositional convergence than tRNA sequences generally. Our results suggest that tRNA-CIF-based phyloclassification is robust to HGT of components of the tRNA interaction network, such as aminoacyl-tRNA synthetases. We explain why tRNAs are especially advantageous for prediction of traits governing macromolecular interactions from genomic data, and why such traits may be advantageous in the search for robust signals to address difficult problems in classification and phylogeny.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. A universal schema for tRNA interaction networks.
tRNAs interact to varying degrees of specificity within a strongly conserved network of protein and RNA complexes. The simultaneous and conflicting requirements of “identity” and “conformity” on tRNAs create potential deleterious pleiotropic effects when components of the network mutate or are transferred to foreign cells by HGT. They also facilitate the bioinformatic prediction of Class-Informative Features (CIFs) from tRNAs that function together in the same or similar networks.
Figure 2
Figure 2. Function logos of structurally aligned tRNA data as calculated by LOGOFUN for two groups of Alphaproteobacteria and overview of tRNA-CIF-based binary phyloclassification.
Function logos generalize sequence logos. They are the sole means by which we predict tRNA Class-Informative Features (CIFs), which form the basis of the scoring schemes of the classifiers reported in this work. A full derivation of the mathematics of function logos is provided in . The tRNA-CIF-based phyloclassifier shown in Figure 3A sums differences in heights of features between two function logos for a set of genomically derived tRNAs. Complete source code and data to reproduce the function logos in this figure are in Dataset S1.
Figure 3
Figure 3. Leave-One-Out Cross-Validation (LOO-CV) scores of alphaproteobacterial genomes under two different binary phyloclassifiers.
A. Score distribution of genomes under the binary tRNA-CIF-based phyloclassifier as sketched in Figure 2. The score of a genome in this classifier is the summation of differences in heights of the features of its tRNAs in the RRCH and RSR function logos in Figure 2. B. Scores under the “zero” total tRNA sequence-based phyloclassifer defined in Materials and Methods and conducted as a control. Here the score of a genome is just the sum of log-odds of its tRNA sequences in two class-specific sequence profiles from the RRCH and RSR clades. See Figure S2 for alternative treatments of missing data under other methods. Complete source code and data to reproduce these results and those in Figure S2 are in Dataset S2.
Figure 4
Figure 4. Breakout of class contributions to scores under the tRNA CIF-based binary phyloclassifier.
Contributions of each functional variety of tRNA, or class, to the tRNA-CIF-based phyloclassifier scores in Figure 3A. Different SAR11 strain tRNAs are plotted separately by genome of origin. Complete source code and data to reproduce these results are in Dataset S3.
Figure 5
Figure 5. Seven-way tRNA-CIF-based phyloclassification of alphaproteobacterial genomes by the default multilayer perceptron in WEKA.
Each test genome classified is assigned a probability of classification into each of the seven alphaproteobacterial clades indicated. Bootstrap support values under resampling of tRNA sites against (left) all tRNA CIFs and (right) CIFs with heights formula image bits and model retraining (100 replicates). All support values correspond to most probable clade as shown except for Stappia and Labrenzia for which they correspond to Rhizobiales. Complete source code and data to produce this figure, including the full WEKA model for classification of other alphaproteobacterial genomes and code to produce such models from scratch, is provided in Dataset S4.
Figure 6
Figure 6. FastUniFrac-based phylogenetic tree of alphaproteobacteria using tRNA data computed according to the methods of .
The FastUniFrac algorithm was recently adapted as a phylogenomic method using tRNA genes. Like the supermatrix phylogenomic approach on tRNAs with results shown in Figures S3 and S4, this method uses unfiltered total sequence information of tRNAs. In contrast to Figure 5, both in this figure and in Figures S3 and S4, all SAR11 strains are affiliated with Rickettsiales. For reasons shown in Figure 7, we argue these results are artifacts of convergence in tRNA base contents. Complete source code and data to reproduce these results are in Dataset S5.
Figure 7
Figure 7. Base compositions of alphaproteobacterial tRNAs showing convergence between Rickettsiales and SAR11.
A. Stacked bar graphs of tRNA base compositions by clade. B. UPGMA clustering of clades based on Euclidean distances of tRNA base compositions under the centered log ratio transformation . tRNA base compositions alone are sufficient to group all SAR11 strains together with Rickettsiales as a clade. Most popular molecular evolutionary models in use today do not account for base content variation as a source of bias in phylogenetic estimation. Complete source code and data to reproduce these results are in Dataset S6.

References

    1. Gribaldo S, Philippe H (2002) Ancient phylogenetic relationships. Theor Popul Biol 61: 391–408. - PubMed
    1. Gogarten JP, Doolittle WF, Lawrence JG (2002) Prokaryotic evolution in light of gene transfer. Mol Biol Evol 19: 2226–2238. - PubMed
    1. Bapteste E, O'Malley MA, Beiko RG, Ereshefsky M, Gogarten JP, et al. (2009) Prokaryotic evolution and the tree of life are two different things. Biol Direct 4: 34. - PMC - PubMed
    1. Morris RM, Rappé MS, Connon SA, Vergin KL, Siebold WA, et al. (2002) SAR 11 clade dominates ocean surface bacterioplankton communities. Nature 420: 806–810. - PubMed
    1. Giovannoni SJ (2005) Genome streamlining in a cosmopolitan oceanic bacterium. Science 309: 1242–1245. - PubMed

Publication types