Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov 27:9:2668.
doi: 10.3389/fmicb.2018.02668. eCollection 2018.

Ancestrality and Mosaicism of Giant Viruses Supporting the Definition of the Fourth TRUC of Microbes

Affiliations

Ancestrality and Mosaicism of Giant Viruses Supporting the Definition of the Fourth TRUC of Microbes

Philippe Colson et al. Front Microbiol. .

Abstract

Giant viruses of amoebae were discovered in 2003. Since then, their diversity has greatly expanded. They were suggested to form a fourth branch of life, collectively named 'TRUC' (for "Things Resisting Uncompleted Classifications") alongside Bacteria, Archaea, and Eukarya. Their origin and ancestrality remain controversial. Here, we specify the evolution and definition of giant viruses. Phylogenetic and phenetic analyses of informational gene repertoires of giant viruses and selected bacteria, archaea and eukaryota were performed, including structural phylogenomics based on protein structural domains grouped into 289 universal fold superfamilies (FSFs). Hierarchical clustering analysis was performed based on a binary presence/absence matrix constructed using 727 informational COGs from cellular organisms. The presence/absence of 'universal' FSF domains was used to generate an unrooted maximum parsimony phylogenomic tree. Comparison of the gene content of a giant virus with those of a bacterium, an archaeon, and a eukaryote with small genomes was also performed. Overall, both cladistic analyses based on gene sequences of very central and ancient proteins and on highly conserved protein fold structures as well as phenetic analyses were congruent regarding the delineation of a fourth branch of microbes comprised by giant viruses. Giant viruses appeared as a basal group in the tree of all proteomes. A pangenome and core genome determined for Rickettsia bellii (bacteria), Methanomassiliicoccus luminyensis (archaeon), Encephalitozoon intestinalis (eukaryote), and Tupanvirus (giant virus) showed a substantial proportion of Tupanvirus genes that overlap with those of the cellular microbes. In addition, a substantial genome mosaicism was observed, with 51, 11, 8, and 0.2% of Tupanvirus genes best matching with viruses, eukaryota, bacteria, and archaea, respectively. Finally, we found that genes themselves may be subject to lateral sequence transfers. In summary, our data highlight the quantum leap between classical and giant viruses. Phylogenetic and phyletic analyses and the study of protein fold superfamilies confirm previous evidence of the existence of a fourth TRUC of life that includes giant viruses, and highlight its ancestrality and mosaicism. They also point out that best evolutionary representations for giant viruses and cellular microorganisms are rhizomes, and that sequence transfers rather than gene transfers have to be considered.

Keywords: TRUC; giant virus; informational genes; megavirales; mimivirus; protein structural domains.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Venn diagram displaying FSF distribution and sharing patterns among Archaea, Bacteria, Eukarya, and Megavirales. A, Archaea; B, Bacteria; E, Eukarya; FSF, fold superfamilies; V, viruses.
FIGURE 2
FIGURE 2
Phylogeny of proteomes describing the evolution of 182 proteomes randomly sampled from cellular organisms and viruses. The universal Tree of Life is rooted using Weston’s generality criterion. The 102 cellular proteomes are from Nasir and Caetano-Anollés (2015).
FIGURE 3
FIGURE 3
Evolutionary principal coordinate (evoPCO) analysis plot portrays in its first three axes the evolutionary distances between cellular and viral proteomes. The percentage of variability explained by each coordinate is given in parentheses on each axis. Data points of the 3-dimensional scatter plot describing temporal clouds are mapped onto projections planes and connected with vertical leading drop lines along the PCO3 axis. The list of whole coordinate information for building the PCoA plot of this figure is provided in Supplementary Table S3.
FIGURE 4
FIGURE 4
Plots of the indices of the phylogenetic tree of proteomes describing the evolution of 182 proteomes randomly sampled from cellular organisms and viruses (corresponding to Figure 2) against the age of the phylogenetic character [fold superfamily (FSF)]. Five measures of the levels of lateral sequence transfers for the maximum parsimony tree reconstruction performed in the present study, namely consistency index (A), retention index (B), rescaled consistency index (C), homoplasy index (D), and G-fit (E), are plotted against the age of the phylogenetic character FSF [measured as node distance (nd) values] for 289 characters (FSF) shared by archaea, bacteria, eukaryota, and viruses. High retention indices, especially for lower nd values (corresponding to older domains), indicates excellent fit of the characters to the phylogeny.
FIGURE 5
FIGURE 5
RNAP1 phylogenetic tree. The RNAP1 tree was built by using aligned protein sequences from Megavirales (red), Bacteria (green), Archaea (pink), and Eukarya (blue). Confidence values were calculated by the Shimodaira-Hasegawa (SH) test using the FastTree program (Price et al., 2010). Average length of sequences was 1,336 amino acids. The scale bar represents the number of estimated changes per position.
FIGURE 6
FIGURE 6
RNAP2 phylogenetic tree. The RNAP2 tree was built by using aligned protein sequences from Megavirales (red), Bacteria (green), Archaea (pink), and Eukarya (blue). Confidence values were calculated by the SH test using the FastTree program (Price et al., 2010). Average length of sequences was 1,188 amino acids. The scale bar represents the number of estimated changes per position.
FIGURE 7
FIGURE 7
DNA polymerase phylogenetic tree. The DNA polymerase tree was built by using aligned protein sequences from Megavirales (red), Bacteria (green), Archaea (pink), and Eukarya (blue). Confidence values were calculated by the SH support using the FastTree program (Price et al., 2010). Average length of sequences was 1,134 amino acids. The scale bar represents the number of estimated changes per position.
FIGURE 8
FIGURE 8
Hierarchical clustering by phyletic pattern based on the presence/absence of informational Clusters of Orthologous Groups (COGs) of proteins. The Megavirales members are represented in red, Bacteria members in green, Archaea members in pink, and Eukarya members in blue.
FIGURE 9
FIGURE 9
Rhizomes of genomes illustrative of the mosaicism of the genomes of representatives of the four TRUCs of microbes including Tupanvirus soda lake (a mimivirus) (A); Encephalitozoon intestinalis (a microbial eukaryote) (B); Methanomassiliicoccus luminyensis (an archaeon) (C); and Rickettsia bellii (a bacterium) (D). The genes of these four microorganisms were linked to their most similar sequences in the NCBI GenBank protein sequence database according to the BLAST program (https://blast.ncbi.nlm.nih.gov/Blast.cgi), classified according to their belonging to viruses, eukaryotes, bacteria or archaea, and integrated in a circular gene data visualization. The figures were performed using the CIRCOS online tool (http://mkweb.bcgsc.ca/tableviewer/visualize/). Circular representations in A and C are the same than those produced for figures from articles Abrahao et al. (2018) and Levasseur et al. (2017), respectively, as they originate from the same data. These representations are licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/) and CC-BY-NC (https://creativecommons.org/licenses/by-nc/4.0/), respectively.
FIGURE 10
FIGURE 10
Rhizomes of methionyl-tRNA synthetase gene fragments illustrative of the mosaicism of the genes of representatives of the four TRUCs of microbes including Tupanvirus soda lake (a mimivirus) (A); Encephalitozoon intestinalis (a microbial eukaryote) (B); Methanomassiliicoccus luminyensis (an archaeon) (C); and Rickettsia bellii (a bacterium) (D). Forty amino acid-long fragments of the methionyl-tRNA synthetase encoding genes of the four microorganisms were linked to their most similar sequences in the NCBI GenBank protein sequence database according to the BLAST program (https://blast.ncbi.nlm.nih.gov/Blast.cgi), classified according to their belonging to viruses, eukaryotes, bacteria or archaea, and integrated in a circular gene data visualization. The figures were performed using the CIRCOS online tool (http://mkweb.bcgsc.ca/tableviewer/visualize/).
FIGURE 11
FIGURE 11
Representation as a rhizome of the genetic evolution for four current intracellular parasites of the four TRUCs of microbes with a comparable genome size, including Rickettsia bellii (a bacterium), Methanomassiliicoccus luminyensis (an archaeon), Encephalitozoon intestinalis (a microbial eukaryote), and Tupanvirus soda lake (a mimivirus). Rhizomes consist in a representation of genome evolution and mosaicism that takes into account that genes and intragenic sequences do not have the same evolutionary history, being proposed as better paradigm of genetic evolution than phylogenetic trees. The genomes of each of the four represented current microorganisms harbor mixtures of sequences of different origins. Sequences corresponding to current bacteria, Archaea, eukaryota, giant viruses, and to ORFans are colored in green, purple, blue, red, and orange, respectively. Rhizomes of the genomes of Tupanvirus and Methanomassiliicoccus luminyensis were adapted from same representations than representations from Levasseur et al. (2017) and Abrahao et al. (2018), respectively, licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/) and CC-BY-NC (https://creativecommons.org/licenses/by-nc/4.0/), respectively (see legend to Figure 9).

References

    1. Abdelrahman Y., Ouellette S. P., Belland R. J., Cox J. V. (2016). Polarized cell division of Chlamydia trachomatis. PLoS Pathog. 12:e1005822. 10.1371/journal.ppat.1005822 - DOI - PMC - PubMed
    1. Abergel C., Legendre M., Claverie J. M. (2015). The rapidly expanding universe of giant viruses: mimivirus, Pandoravirus, Pithovirus and Mollivirus. FEMS Microbiol. Rev. 39 779–796. 10.1093/femsre/fuv037 - DOI - PubMed
    1. Abrahao J., Silva L., Santos Silva L., Bou Khalil J. Y., Rodrigues R., Arantes T., et al. (2018). Tupanvirus, a tailed giant virus and distant relative of Mimiviridae, possesses the most complete translational apparatus of the virosphere. Nat. Commun. 9:749. 10.1038/s41467-018-03168-1 - DOI - PMC - PubMed
    1. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. (1990). Basic local alignment search tool. J. Mol. Biol. 215 403–410. 10.1016/S0022-2836(05)80360-2 - DOI - PubMed
    1. Alva V., Söding J., Lupas A. N. (2015). A vocabulary of ancient peptides at the origin of folded proteins. eLife 4:e09410. 10.7554/eLife.09410 - DOI - PMC - PubMed

LinkOut - more resources