Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Oct;17(10):1496-504.
doi: 10.1101/gr.6305707. Epub 2007 Sep 4.

The evolution of genome compression and genomic novelty in RNA viruses

Affiliations

The evolution of genome compression and genomic novelty in RNA viruses

Robert Belshaw et al. Genome Res. 2007 Oct.

Abstract

The genomes of RNA viruses are characterized by their extremely small size and extremely high mutation rates (typically 10 kb and 10(-4)/base/replication cycle, respectively), traits that are thought to be causally linked. One aspect of their small size is the genome compression caused by the use of overlapping genes (where some nucleotides code for two genes). Using a comparative analysis of all known RNA viral species, we show that viruses with larger genomes tend to have less gene overlap. We provide a numerical model to show how a high mutation rate could lead to gene overlap, and we discuss the factors that might explain the observed relationship between gene overlap and genome size. We also propose a model for the evolution of gene overlap based on the co-opting of previously unused ORFs, which gives rise to two types of overlap: (1) the creation of novel genes inside older genes, predominantly via +1 frameshifts, and (2) the incremental increase in overlap between originally contiguous genes, with no frameshift preference. Both types of overlap are viewed as the creation of genomic novelty under pressure for genome compression. Simulations based on our model generate the empirical size distributions of overlaps and explain the observed frameshift preferences. We suggest that RNA viruses are a good model system for the investigation of general evolutionary relationship between genome attributes such as mutational robustness, mutation rate, and size.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Relationship between gene overlap (as a proportion of information content) and information content, both expressed as natural logarithms. Points are means for the following taxa. 1, Acyrthosiphon pisum virus (n = 1); 2, Arenaviridae (n = 11); 3, Arteriviridae (n = 4); 4, Astroviridae (n = 6); 5, Barnaviridae (n = 1); 6, Beet western yellows ST9 associated virus (n = 1); 7, Benyvirus (n = 2); 8, Birnaviridae (n = 5); 9, Bornaviridae (n = 1); 10, Botrytis virus X (n = 1); 11, Bromoviridae (n = 22); 12, Bunyaviridae (n = 20); 13, Caliciviridae (n = 13); 14, Caulimoviridae (n = 23); 15, Closteroviridae (n = 16); 16, Comoviridae (n = 18); 17, Coronaviridae (n = 12); 18, Cystoviridae (n = 4); 19, Filoviridae (n = 4); 20, Flaviviridae (n = 34); 21, Flexiviridae (n = 52); 22, Fusarium graminearum dsRNA mycovirus 1 (n = 1); 23, Hepadnaviridae (n = 10); 24, Hepeviridae (n = 1); 25, Hordeivirus (n = 1); 26, Hypoviridae (n = 4); 27, Leviviridae (n = 8); 28, Luteoviridae (n = 17); 29, Nodaviridae (n = 8); 30, Ophiovirus (n = 3); 31, Orthomyxoviridae (n = 5); 32, Oyster mushroom spherical virus (n = 1); 33, Paramyxoviridae (n = 28); 34, Pecluvirus (n = 2); 35, Picobirnavirus (n = 1); 36, Pomovirus (n = 4); 37, Reoviridae (n = 21); 38, Retroviridae (n = 40); 39, Rhabdoviridae (n = 17); 40, Sclerophthora macrospora virus A (n = 1); 41, Sobemovirus (n = 9); 42, Tetraviridae (n = 4); 43, Thielaviopsis basicola dsRNA virus 1 (n = 1); 44, Tobamovirus (n = 15); 45, Tobravirus (n = 3); 46, Togaviridae (n = 16); 47, Tombusviridae (n = 35); 48, Totiviridae (n = 20); 49, Tymoviridae (n = 12); 50, Umbravirus (n = 4). The following taxa are excluded because of zero overlap: Botrytis virus F (n = 1), Cheravirus (n = 2), Chrysoviridae (n = 1), Diaporthe ambigua RNA virus 1 (n = 1), Dicistroviridae (n = 12), Endornavirus (n = 1), Furovirus (n = 5), Idaeovirus (n = 1), Iflavirus (n = 7), Marnaviridae (n = 1), Narnaviridae (n = 8), Partitiviridae (n = 14), Picornaviridae (n = 31), Potyviridae (n = 54), Sequiviridae (n = 6), and Tenuivirus (n = 2).
Figure 2.
Figure 2.
Explanation of terminology used to describe gene overlaps. (Note that some other investigators use a +2 notation to represent our −1 frameshift.)
Figure 3.
Figure 3.
Frequency distribution of observed overlap lengths and the lengths of ORFs recovered by our simulation. (A) Internal Overlaps: observed and simulated data. (B) Terminal Overlaps: observed and simulated data. Observed overlap lengths are shown as histograms with each value being the mean of a homologous group. Simulated ORF lengths are shown as lines that connect the mid points of a hidden histogram: The thick lines show the +1 frameshift, and the thin line shows the −1 frameshift; the effect of including an extension probability of 0.5 in the simulation is also indicated.
Figure 4.
Figure 4.
Simulation results showing effect of increasing gene overlap on the fitness effect of mutations, given different models for the interactions between mutations. Fitness is shown relative to the value for that model of fitness interaction at zero overlap. For epistatic interactions, taken from Elena and Lenski (1997), the additional fitness parameter values of a = 0.01 and b = 0.02 have been chosen for illustrative purposes only.

References

    1. Baril M., Brakier-Gingras L., Brakier-Gingras L. Translation of the F protein of hepatitis C virus is initiated at a non-AUG codon in a +1 reading frame relative to the polyprotein. Nucleic Acids Res. 2005;33:1474–1486. doi: 10.1093/nar/gki292. - DOI - PMC - PubMed
    1. Burch C.L., Turner P.E., Hanley K.A., Turner P.E., Hanley K.A., Hanley K.A. Patterns of epistasis in RNA viruses: A review of the evidence from vaccine design. J. Evol. Biol. 2003;16:1223–1235. - PubMed
    1. Chare E.R., Gould E.A., Holmes E.C., Gould E.A., Holmes E.C., Holmes E.C. Phylogenetic analysis reveals a low rate of homologous recombination in negative-sense RNA viruses. J. Gen. Virol. 2003;84:2691–2703. - PubMed
    1. Codoñer F.M., Daròs J.A., Solé R.V., Elena S.F., Daròs J.A., Solé R.V., Elena S.F., Solé R.V., Elena S.F., Elena S.F. The fittest versus the flattest: Experimental confirmation of the quasispecies effect with subviral pathogens. PLoS Pathog. 2006;2:e136. doi: 10.1371/journal.ppat.0020136. - DOI - PMC - PubMed
    1. Cristina J., Lopez F., Moratorio G., Lopez L., Vasquez S., Garcia-Aguirre L., Chunga A., Lopez F., Moratorio G., Lopez L., Vasquez S., Garcia-Aguirre L., Chunga A., Moratorio G., Lopez L., Vasquez S., Garcia-Aguirre L., Chunga A., Lopez L., Vasquez S., Garcia-Aguirre L., Chunga A., Vasquez S., Garcia-Aguirre L., Chunga A., Garcia-Aguirre L., Chunga A., Chunga A. Hepatitis C virus F protein sequence reveals a lack of functional constraints and a variable pattern of amino acid substitution. J. Gen. Virol. 2005;86:115–120. - PubMed

Publication types

LinkOut - more resources