Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Feb 13;11(2):e1004664.
doi: 10.1371/journal.ppat.1004664. eCollection 2015 Feb.

Evolution of genome size and complexity in the rhabdoviridae

Affiliations

Evolution of genome size and complexity in the rhabdoviridae

Peter J Walker et al. PLoS Pathog. .

Abstract

RNA viruses exhibit substantial structural, ecological and genomic diversity. However, genome size in RNA viruses is likely limited by a high mutation rate, resulting in the evolution of various mechanisms to increase complexity while minimising genome expansion. Here we conduct a large-scale analysis of the genome sequences of 99 animal rhabdoviruses, including 45 genomes which we determined de novo, to identify patterns of genome expansion and the evolution of genome complexity. All but seven of the rhabdoviruses clustered into 17 well-supported monophyletic groups, of which eight corresponded to established genera, seven were assigned as new genera, and two were taxonomically ambiguous. We show that the acquisition and loss of new genes appears to have been a central theme of rhabdovirus evolution, and has been associated with the appearance of alternative, overlapping and consecutive ORFs within the major structural protein genes, and the insertion and loss of additional ORFs in each gene junction in a clade-specific manner. Changes in the lengths of gene junctions accounted for as much as 48.5% of the variation in genome size from the smallest to the largest genome, and the frequency with which new ORFs were observed increased in the 3' to 5' direction along the genome. We also identify several new families of accessory genes encoded in these regions, and show that non-canonical expression strategies involving TURBS-like termination-reinitiation, ribosomal frame-shifts and leaky ribosomal scanning appear to be common. We conclude that rhabdoviruses have an unusual capacity for genomic plasticity that may be linked to their discontinuous transcription strategy from the negative-sense single-stranded RNA genome, and propose a model that accounts for the regular occurrence of genome expansion and contraction throughout the evolution of the Rhabdoviridae.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Schematic representation of the genomes of rhabdoviruses analysed.
The genomes are shown in (+) sense with arrows indicating the locations of ORFs ≥180 nt. The five common structural protein genes (N, P, M, G and L) are shaded in black. Orthologous genes or genes encoding structurally similar proteins are shaded in the same colour, including viroporin-like proteins which are shaded in yellow. ORFs for which no orthologous or structurally similar proteins could be identified are shaded in light grey. The viruses are grouped according to established genera, proposed new genera or unassigned species (see Fig. 2).
Fig 2
Fig 2. ML phylogenetic tree of 100 rhabdovirus L protein sequences.
Branches are colour-coded according to known vector species, while the principal animal host species (where known) are shown by indicated symbols. Horizontal branch lengths are drawn to a scale of amino acid substitutions/site, and all bootstrap proportion values (BSP) ≥ 85% are shown by the * symbol. Newly proposed genera are indicated by a † symbol. Cytorhabdovirus, novirhabdovirus and nucleorhabdovirus outgroup sequences were excluded from the tree as they were too divergent to establish a reliable rooting. The tree is therefore rooted arbitrarily on one of two basal clades (genera Almendravirus and Bahiavirus) that comprise viruses isolated from mosquitoes.
Fig 3
Fig 3. The relative length of sequences of known or predicted function and unknown function.
Sequences of known or predicted function include ORFs and transcriptional regulatory sequences. Sequences of unknown function include 5’- and 3’-UTRs within transcriptional units and IGRs between transcriptional units. Genomic leader sequences (up to the N gene TI sequence) and trailer sequences (beyond the L gene TTP sequence) were excluded from the analysis as the extreme terminal sequences of some viruses were not determined. The sequence lengths are provided for each virus in the data set but identified only by their genus assignment.
Fig 4
Fig 4. Illustration of the similar structural characteristics of viroporin-like proteins.
The sequences illustrate predicted hydrophobic transmembrane domains (grey shaded) which are usually bounded by anchoring charged residues (black shaded), an N-terminal domain containing several large aromatic residues (F, Y, W), and a C-terminal domain containing a large number of basic residues (R, K, H) (bold and underlined). The proteins are assigned numbers according to our defined annotation rules and are grouped according to existing genera and new genera proposed in this paper.
Fig 5
Fig 5. TURBS-like sequence motifs in the genomes of sripuviruses, curioviruses, hapaviruses and ephemeroviruses.
The motif features the sequence UGGGA (highlighted) flanked short stretches of anti-complementary sequence (single underlined) upstream of overlapping or adjacent initiation and termination codons (double underlined). Variations in the TURBS sequence (UGAGA) occur in CHOV, SMV and ARUV. The ORF junctions (M-Mx; U1-U1x; G-Gx; α1-α2) are indicated for each virus. No TURBS-like sequence was detected upstream of the adjacent initiation and termination codons at the GLOV G—U3 junction or the KOOLV α1-α2 junction.
Fig 6
Fig 6. Number of alternative ORFs of various size ranges (nt) across the genome data set.
ORFs ≥ 90 nt were identified in each genome and the assembled set was grouped into size ranges (i.e., 90–149, 150–209, 210–269, etc). The total number of observations of ORFs in each size range is shown. All ORFs ≥180 nt (60 aa) are listed in S2 Table.
Fig 7
Fig 7. A model for the evolution of rhabdovirus accessory genes.
The model accounts for accessory genes that emerge initially from small ORFs arising randomly through mutation in alternative reading frames within existing ORF or in 5’ or 3’UTRs within transcriptional units.

References

    1. Holmes EC (2009) The Evolution and Emergence of RNA Viruses. Oxford: Oxford University Press.
    1. King AMQ, Adams MJ, Carstens EB, Lefkowitz EJ, editors (2012) Virus Taxonomy. Classification and Nomenclature of Viruses Ninth report of the International Committee on Taxonomy of Viruses. London: Elsevier Academic Press.
    1. Lauber C, Goeman JJ, Parquet Mdel C, Nga PT, Snijder EJ, et al. (2013) The footprint of genome architecture in the largest genome expansion in RNA viruses. PLoS Pathogens 9: e1003500 10.1371/journal.ppat.1003500 - DOI - PMC - PubMed
    1. Belshaw R, Pybus OG, Rambaut A (2007) The evolution of genome compression and genomic novelty in RNA viruses. Genome Research 17: 1496–1504. - PMC - PubMed
    1. Steinhauer DA, Domingo E, Holland JJ (1992) Lack of evidence for proofreading mechanisms associated with an RNA virus polymerase. Gene 122: 281–288. - PubMed

Publication types

Associated data