Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 13;6(1):veaa009.
doi: 10.1093/ve/veaa009. eCollection 2020 Jan.

Properties and abundance of overlapping genes in viruses

Affiliations

Properties and abundance of overlapping genes in viruses

Timothy E Schlub et al. Virus Evol. .

Abstract

Overlapping genes are commonplace in viruses and play an important role in their function and evolution. However, aside from studies on specific groups of viruses, relatively little is known about the extent and nature of gene overlap and its determinants in viruses as a whole. Here, we present an extensive characterisation of gene overlap in viruses through an analysis of reference genomes present in the NCBI virus genome database. We find that over half the instances of gene overlap are very small, covering <10 nt, and 84 per cent are <50 nt in length. Despite this, 53 per cent of all viruses still contained a gene overlap of 50 nt or larger. We also investigate several predictors of gene overlap such as genome structure (single- and double-stranded RNA and DNA), virus family, genome length, and genome segmentation. This revealed that gene overlap occurs more frequently in DNA viruses than in RNA viruses, and more frequently in single-stranded viruses than in double-stranded viruses. Genome segmentation is also associated with gene overlap, particularly in single-stranded DNA viruses. Notably, we observed a large range of overlap frequencies across families of all genome types, suggesting that it is a common evolutionary trait that provides flexible genome structures in all virus families.

Keywords: meta data; overlapping genes; overprinted genes; reference genomes.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Logarithmic scaled histogram of the length of gene overlap. Overall, 54 per cent of gene overlaps are <10 nt in length, and 81 per cent of gene overlaps are <50 nt in length.
Figure 2.
Figure 2.
Proportions of genomes with at least one instance of gene overlap (>50 nt) across viral groups. Error bars represent 95 per cent CI for the proportion.
Figure 3.
Figure 3.
Proportions of genomes with at least one instance of gene overlap across viral groups, stratified by virus family. Virus families are ordered by their proportion and then the width of the confidence intervals. Error bars represent 95 per cent CIs for the proportion within a family. Vertical lines represent overall means within a viral genome group.
Figure 4.
Figure 4.
(A) The cumulative distribution of the total abundance of gene overlap (number of gene with an overlap per genome) over all viruses studied here. (B) Histograms of the total abundance of gene overlap by virus family, truncated at 10. All virus groups had a maximum total abundance less than or equal to 10, except dsDNA viruses with a very long tail up to 789, and ssDNA viruses with a maximum of 15. (C) Histograms of the total abundance of gene overlap by family as a proportion of the number of genes (relative frequency = number genes with an overlaps/number of genes).
Figure 5.
Figure 5.
Proportion of genes with at least one instance of gene overlap stratified by segmented genomes. Error bars represent 95 per cent CIs.
Figure 6.
Figure 6.
Number of genes involved in gene overlap by genome size (excluding all genomes with no overlap). Trend lines are Loess curves with span 0.80. Both the x and y axes depict log scales.
Figure 7.
Figure 7.
Total number of nucleotides involved in gene overlap by genome size (excluding all genomes with no overlap). Trend lines are Loess curves with span 0.80. Both the x and y axes depict log scales.
Figure 8.
Figure 8.
The proportion of genomes containing a gene overlap that has an antisense overlap in each virus group. Error bars represent 95 per cent CIs. Numbers to the right of error bars represent the number of genomes containing an antisense gene overlap (e.g. 490 genomes in Group I (dsDNA) contain an antisense gene overlap).

References

    1. Bates D. et al. (2015) ‘Fitting Linear Mixed-Effects Models Using lme4’, Journal of Statistical Software, 67: 1–48.
    1. Belshaw R., Pybus O. G., Rambaut A. (2007) ‘The Evolution of Genome Compression and Genomic Novelty in RNA Viruses’, Genome Research, 17: 1496–504. - PMC - PubMed
    1. Bozarth C. S., Weiland J. J., Dreher T. W. (1992) ‘Expression of ORF-69 of Turnip Yellow Mosaic Virus is Necessary for Viral Spread in Plants’, Virology, 18: 124–30. - PubMed
    1. Brandes N., Linial M. (2016) ‘Gene Overlapping and Size Constraints in the Viral World’, Biology Direct, 11: 26. - PMC - PubMed
    1. Bransom K. L. et al. (1995) ‘Coding Density of the Turnip Yellow Mosaic Virus Genome: Roles of the Overlapping Coat Protein and p206-Readthrough Coding Regions’, Virology, 206: 403–12. - PubMed