Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Mar;23(3):154-168.
doi: 10.1038/s41576-021-00417-w. Epub 2021 Oct 5.

Overlapping genes in natural and engineered genomes

Affiliations
Review

Overlapping genes in natural and engineered genomes

Bradley W Wright et al. Nat Rev Genet. 2022 Mar.

Abstract

Modern genome-scale methods that identify new genes, such as proteogenomics and ribosome profiling, have revealed, to the surprise of many, that overlap in genes, open reading frames and even coding sequences is widespread and functionally integrated into prokaryotic, eukaryotic and viral genomes. In parallel, the constraints that overlapping regions place on genome sequences and their evolution can be harnessed in bioengineering to build more robust synthetic strains and constructs. With a focus on overlapping protein-coding and RNA-coding genes, this Review examines their discovery, topology and biogenesis in the context of their genome biology. We highlight exciting new uses for sequence overlap to control translation, compress synthetic genetic constructs, and protect against mutation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overlapping gene definition and topologies.
a | Gene overlap definitions differ between prokaryotes and eukaryotes. (Top) Eukaryote overlaps are most frequently defined as overlaps between the boundaries of the primary transcript, shown here in the shaded region. Often, the overlap is only between the 5′ untranslated region (UTR) or 3′ UTR of both transcripts (5′ UTR overlap shown),. (Bottom) In contrast, prokaryote and virus genes are only considered to overlap if their coding sequences overlap,. Thin boxes denote 5′ and 3′ UTRs while thick boxes are coding sequences. Arrowheads indicate the extent of the consensus definition of gene boundaries within studies referenced in this review. b | Genes and open reading frames (ORFs) can be overlapped in one of three topologies. Unidirectional (also called tandem) overlaps occur between genes and ORFs on the same strand. Divergent (also called head-to-head) overlaps occur between genes and ORFs on opposite strands that overlap at their 5′-ends. Convergent (also called tail-to-tail) overlaps occur between genes and ORFs on opposite strands that overlap at the 3′-ends. c | Gene and ORF interactions can be either overlapped, where only limited portions of each gene or ORF are overlapping, or nested, where the entire sequence of one partner falls within the boundaries of the other.
Fig. 2
Fig. 2. Mechanisms of gene and ORF overlap creation.
New overlaps can be created through a range of mechanisms and likely require numerous complementary developments to produce the appropriate sequence context for retention of gene or open reading frame (ORF) functionality. a | Mutations removing the start codon of a downstream ORF may result in the next available upstream start codon being utilized, which could be within an upstream ORF. b | Mutational loss of a stop codon may result in the extension of an ORF. Similar to start codon loss, the next available stop codon may be utilized, which could be within a downstream ORF. c | De novo generation of an ORF may begin with the creation of a start codon within an existing coding region through mutation and, in conjunction with a downstream stop codon, produces an overlapping ORF. d | Non-coding intron sequences may acquire a start codon through mutation and, in conjunction with a downstream stop codon, produce a nested ORF. e | Mutations that result in the de novo development of a sequence capable of recruiting transcriptional machinery (such as a promoter or enhancer) may result in a new overlapping gene. f | Genome rearrangements, such as inversions and translocations, may result in distant non-overlapping genes becoming overlapped. This mechanism has been seen within human cancers. g | Mobile genetic elements carrying genes (such as transposons or proviral genes) may localize to within a gene, generating a new gene overlap,.
Fig. 3
Fig. 3. Selective pressures involved in retaining gene and ORF overlaps.
a,b | Overlapping start and stop codons cause translation coupling between unidirectional overlapping open reading frames (ORFs) through unwinding of mRNA secondary structure around the ribosome binding site and start codon and by enhancing ribosome re-initiation. c | Overlapping sequence regions cause mutations to affect more than one ORF, increasing fitness cost and preserving overlapped sequences under mutational pressure,. d | Encoding more ORFs in the same sequence region allows genetic novelty with reduced genome changes, which is particularly advantageous for viruses that have spatial constraints on genome size,. e | Sense–antisense gene and ORF overlap is frequently involved with gene expression regulation, including non-coding RNA and long non-coding RNA. f | Transcriptional tuning from convergent overlapping genes and ORFs as a result of interactions between RNA polymerase collisions (transcriptional interference,).
Fig. 4
Fig. 4. Gene and ORF overlap across prokaryotes, eukaryotes and their viruses.
a | Escherichia coli menaquinone biosynthesis operon contains three short stop–start coding sequence (CDS) overlaps. b | The large human gene NF1 and internal nested protein-coding ORFs OMG, EVI2B and EVI2A are located within NF1 introns. c | Recently described alt-RPL36 (bottom) overlaps the human ribosomal protein gene RPL36 (ref.) through an out-of-frame GTG start codon within a 5′-extended RPL36 exon present on RPL36 transcript variant 2. The alt-RPL36 CDS generates a longer protein with an entirely different sequence from RPL36 (ref.). d | The virus φX174 contains overlaps in all three reading frames: three short unidirectional stop–start CDS overlaps, two nested CDSs, and one in-frame start generating an N-terminally trunkated protein.
Fig. 5
Fig. 5. Disruptions to overlapping genes within a refactored phage genome and a complex biosynthetic gene cluster.
a | Creation of φX174.1f, also known as decompressed φX174, disrupted four unidirectional stop–start coding sequence (CDS) overlaps and two fully nested overlapping CDSs. b | Refactoring the nitrogen fixation cluster from Klebsiella oxytoca disrupted four stop–start CDS overlaps and CDS overlaps varying from 1–14 bp (ref.).
Fig. 6
Fig. 6. Exploiting CDS overlap for applications in bioengineering.
a | The Constraining Adaptive Mutations using Engineered Overlapping Sequences (CAMEOS) method searches for available overlaps between an essential gene and a gene of interest to be shielded from mutation while minimizing sequence changes. Asterisks identify amino acid modifications to accommodate coding sequence (CDS) overlap. b | The RiBoSor algorithm searches for places within a gene of interest to silently create a ribosome binding site followed by a start codon in a different reading frame than the existing CDS. This generates a CDS that extends to the 3′ end of the existing CDS. An essential gene is then fused in-frame to the newly created CDS just 3′ of the stop codon of the original CDS. Asterisks identify amino acid modifications to accommodate overlap. c | Reliable and tunable expression of a gene of interest can be facilitated by the bicistronic device. (Left) A single ribosome binding site upstream of a variety of different CDSs can result in different interactions with the RBS and the coding sequence, causing variable translation initation rates that are difficult to predict. (Right) In the bicistronic device, the binding of a ribosome to the upstream ribosome binding site 1 of CDS 1 and its translation towards the gene of interest will disrupt inhibitory sequence structures. The ribosome will recognize the ribosome binding site (RBS2) of the downstream gene of interest and re-initiate translation providing a platform for reliable expression of the gene of interest.

References

    1. Barrell BG, Air GM, Hutchison CA., 3rd Overlapping genes in bacteriophage phiX174. Nature. 1976;264:34–41. - PubMed
    1. Sanger F, et al. Nucleotide sequence of bacteriophage φX174 DNA. Nature. 1977;265:687. - PubMed
    1. Linney E, Hayashi M. Intragenic regulation of the synthesis of ΦX174 gene A proteins. Nature. 1974;249:345. - PubMed
    1. Roznowski AP, Doore SM, Kemp SZ, Fane BA. Finally, a role befitting Astar: the strongly conserved, unessential microvirus A* proteins ensure the product fidelity of packaging reactions. J. Virol. 2020;94:e01593-19. - PMC - PubMed
    1. Schlub TE, Holmes EC. Properties and abundance of overlapping genes in viruses. Virus Evol. 2020;6:veaa009. - PMC - PubMed

Publication types