Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 26;12(6):809.
doi: 10.3390/genes12060809.

Origin, Evolution and Stability of Overlapping Genes in Viruses: A Systematic Review

Affiliations

Origin, Evolution and Stability of Overlapping Genes in Viruses: A Systematic Review

Angelo Pavesi. Genes (Basel). .

Abstract

During their long evolutionary history viruses generated many proteins de novo by a mechanism called "overprinting". Overprinting is a process in which critical nucleotide substitutions in a pre-existing gene can induce the expression of a novel protein by translation of an alternative open reading frame (ORF). Overlapping genes represent an intriguing example of adaptive conflict, because they simultaneously encode two proteins whose freedom to change is constrained by each other. However, overlapping genes are also a source of genetic novelties, as the constraints under which alternative ORFs evolve can give rise to proteins with unusual sequence properties, most importantly the potential for novel functions. Starting with the discovery of overlapping genes in phages infecting Escherichia coli, this review covers a range of studies dealing with detection of overlapping genes in small eukaryotic viruses (genomic length below 30 kb) and recognition of their critical role in the evolution of pathogenicity. Origin of overlapping genes, what factors favor their birth and retention, and how they manage their inherent adaptive conflict are extensively reviewed. Special attention is paid to the assembly of overlapping genes into ad hoc databases, suitable for future studies, and to the development of statistical methods for exploring viral genome sequences in search of undiscovered overlaps.

Keywords: asymmetric evolution; codon usage; de novo protein creation; modular evolution; multivariate statistics; negative selection: phylogenetic distribution; positive selection; prediction methods; sequence-composition features; symmetric evolution; virus evolution.

PubMed Disclaimer

Conflict of interest statement

The author declares no conflict of interest.

Figures

Figure 1
Figure 1
Orientation of same-strand overlapping genes. (A) Overlapping gene with the downstream frame shifted one nucleotide 3′ with respect to the upstream frame. There are 3 types of codon position (cp): cp13 (bold character), in which the first codon position of the upstream frame overlaps the third codon position of the downstream frame; cp21 (underlined character), in which the second codon position of the upstream frame overlaps the first codon position of the downstream frame; cp32 (italic character), in which the third codon position of the upstream frame overlaps the second codon position of the downstream frame. (B) Overlapping gene with the downstream frame shifted two nucleotides 3′ with respect to the upstream frame. There are 3 types of codon position (cp): cp12 (bold character), in which the first codon position of the upstream frame overlaps the second codon position of the downstream frame; cp23 (underlined character), in which the second codon position of the upstream frame overlaps the third codon position of the downstream frame; cp31 (italic character), in which the third codon position of the upstream frame overlaps the first codon position of the downstream frame. According to the genetic code, a nucleotide substitution at first codon position causes an amino acid change in 95.4% of cases, at second position in 100% of cases, and at third position in 28.4% of cases.
Figure 2
Figure 2
Increase in the genome information content during the evolution of microviruses (family Microviridae). The nomenclature of genes, from A to J, is that originally proposed in ΦX174 [11,12,13]. Empty boxes indicate ancestral pre-existing genes, while grey boxes indicate the new genes (or gene regions) that originated by overprinting. Figure reproduced from [18] with the permission of the Microbiology Society.
Figure 3
Figure 3
Dendrogram of the methyltransferase-like domain of replicase from Turnip yellow mosaic virus (TYMV), Kennedya yellow mosaic tymovirus (KYMV), Eggplant mosaic tymovirus (EMV), Ononis yellow mosaic tymovirus (OYMV), Erysimum latent tymovirus (ELV), Potato X potexvirus (PVX), White clover mosaic potexvirus (WClMV), Narcissus mosaic potexvirus (NMV), Apple chlorotic leaf spot closterovirus (ACLSV), Potato M carlavirus (PVM), Alfalfa mosaic alfamovirus (AIMV), Tobacco mosaic tobamovirus (TMV), and Beet necrotic yellow vein furovirus (BNYVV). The overlapping ORF encoding a movement protein (entirely nested within replicase) is a genetic novelty unique to tymoviruses. Figure reproduced from [9] with the permission of the authors.
Figure 4
Figure 4
Predicted disorder content of proteins encoded by overlapping genes. The error bars correspond to a 95% confidence interval. Figure reproduced from [21] with the permission of the American Society of Microbiology.
Figure 5
Figure 5
Difference between the pooled sets of overlapping and non-overlapping genes for the 20 most critical composition features. (A) Nucleotides and dinucleotides. (B) Amino acids and amino acids grouped in accordance to codon degeneracy. (C) Synonymous codons. The figure, made by A. Vianelli, was reproduced from [35].
Figure 6
Figure 6
Principal component analysis (PCA) of a sample set of 319 overlapping genes. The three-dimensional map was obtained using the first (PC1), second (PC2), and third (PC3) principal component. Black circles indicate the 4 homologs of the overlapping gene polymerase/protein X of Hepatitis B virus. They were classified as outlier because of a highly atypical sequence composition. Figure reproduced from [83] with the permission of Elsevier.
Figure 7
Figure 7
Histogram of the distribution of LDA score in overlapping genes (grey columns) and in non-overlapping genes (black columns). With a discriminant score of −35.31, a high percentage (96.5%) of overlapping genes were correctly classified as overlap (score below −35.31) and a high percentage (97.1%) of non-overlapping genes were correctly classified as non-overlap (score above −35.31). Figure was reproduced from [83] with the permission of Elsevier.
Figure 8
Figure 8
Histogram of the distribution of PLS-DA score in overlapping genes (grey columns) and in non-overlapping genes (black columns). With a discriminant score of 0, a high percentage of overlapping genes (94.9%) were correctly classified as overlap (score below 0) and a high percentage (98.4) of non-overlapping genes were correctly classified as non-overlap (score above 0). Figure reproduced from [83] with the permission of Elsevier.
Figure 9
Figure 9
(A) Histogram of the distribution of the LDA score in 126 ancestral frames (black columns) and in the respective +1 de novo frames (grey columns). With a discriminant score of 17.20, a high percentage (96.8%) of ancestral frames were correctly classified as ancestral (score above 17.20) and a high percentage (97.6%) of +1 de novo frames were correctly classified as de novo (score below 17.20). (B) Histogram of the distribution of the LDA score in 68 ancestral frames (black columns) and in the respective +2 de novo frames (grey columns). With a discriminant score of −34.98, all ancestral frames and all +2 de novo frames were correctly classified as ancestral and de novo, respectively. Figure reproduced from [83] with the permission of Elsevier.
Figure 10
Figure 10
Map of the genome of HBV with overlapping and non-overlapping coding regions. Pre-S1, Pre-S2, and S are the domains of surface protein. TP, SP, RT, and RNase are the domains of polymerase. TP, terminal protein domain; SP, spacer domain; RT, reverse transcriptase domain; RNase, ribonuclease domain; C, capsid. Figure reproduced from [100] with the permission of the Microbiology Society.
Figure 11
Figure 11
Modular evolution in the genesis of the overlapping gene polymerase/surface protein of hepadnaviruses. (A) Putative primordial genome of HBV. (B) Birth of a novel frame encoding the SP domain of polymerase (shaded box). (C) Birth of a novel frame encoding the C-terminal region of the Pre-S2 domain and the S domain of surface protein (shaded box). Figure reproduced from [100] with the permission of the Microbiology Society.
Figure 12
Figure 12
Map of overlapping genes (grey circles) and non-overlapping genes (black circles), in which the PLS-DA score is plotted against the respective LDA score. Grey circles in part (A) indicate overlaps correctly classified by both methods (94.2% of the total). Black circles in part C indicate non-overlaps correctly classified by both methods (97.1% of the total). Gray circles in part (BD) indicate overlaps misclassified by one or both methods (5.8% of the total). Black circles in part (A) and (D) indicate non-overlaps misclassified by one or both methods (2.9% of the total). Asterisks in part (A) indicate two new potential overlapping genes detected in the genome of SARS-CoV-2 (isolate Wuhan-Hu-1). Figure reproduced from [83] with the permission of Elsevier.

Similar articles

Cited by

References

    1. Taylor J.S., Raes J. Duplication and divergence: The evolution of new genes and old ideas. Annu. Rev. Genet. 2004;38:615–643. doi: 10.1146/annurev.genet.38.072902.092831. - DOI - PubMed
    1. Long M., Betran E., Thornton K., Wang W. The origin of new genes: Glimpses from the young and old. Nat. Rev. Genet. 2003;4:865–875. doi: 10.1038/nrg1204. - DOI - PubMed
    1. Patthy L. Genome evolution and the evolution of exon-shuffling—A review. Gene. 1999;238:103–114. doi: 10.1016/S0378-1119(99)00228-0. - DOI - PubMed
    1. Treangen T.J., Rocha E.P.C. Horizontal transfer, not duplication, drives the expansion of protein families in prokaryotes. PLoS Genet. 2011;7:e1001284. doi: 10.1371/journal.pgen.1001284. - DOI - PMC - PubMed
    1. Li C.Y., Zhang Y., Wang Z., Cao C., Zhang P.W., Lu S.J., Li X.M., Yu Q., Zheng Y., Du Q., et al. A human-specific de novo protein-coding gene associated with human brain functions. PLoS Comput. Biol. 2010;6:e1000734. doi: 10.1371/journal.pcbi.1000734. - DOI - PMC - PubMed

Publication types

LinkOut - more resources