Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Apr 12;3(4):100296.
doi: 10.1016/j.xgen.2023.100296.

Not all exons are protein coding: Addressing a common misconception

Affiliations
Review

Not all exons are protein coding: Addressing a common misconception

Julie L Aspden et al. Cell Genom. .

Abstract

Exons are regions of DNA that are transcribed to RNA and retained after introns are spliced out. However, the term "exon" is often misused as synonymous to "protein coding," including in some literature and textbook definitions. In contrast, only a fraction of exonic sequences are protein coding (<30% in humans). Both exons and introns are also present in untranslated regions (UTRs) and non-coding RNAs. Misuse of the term exon is problematic, for example, "whole-exome sequencing" technology targets <25% of the human exome, primarily regions that are protein coding. Here, we argue for the importance of the original definition of an exon for making functional distinctions in genetics and genomics. Further, we recommend the use of clearer language referring to coding exonic regions and non-coding exonic regions. We propose the use of coding exome sequencing, or CES, to more appropriately describe sequencing approaches that target primarily protein-coding regions rather than all transcribed regions.

Keywords: UTRs; exome sequencing; exons; introns; non-coding RNA; splicing; untranslated regions.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Schematics showing the position of the exons and introns with reference to the coding sequence (CDS) and untranslated regions (UTRs) (A) The genomic region of a protein-coding transcript with six exons. Exons 3, 4, and 5 are entirely CDSs, exon 1 is entirely 5′ UTR, and exons 2 and 6 contain both CDSs and UTR sequences. (B) The mature mRNA (after removal of introns by splicing) of the same protein-coding transcript as represented in (A). (C) The mature RNA of a long non-coding RNA (lncRNA) also with six exons, all of which are entirely non-coding. 5′ UTRs containing exons are indicated in yellow, CDSs containing exons are in green, 3′ UTRs containing exons are in pink, and lncRNA exons are in purple. The poly-A signal is in blue. This figure was made in BioRender.
Figure 2
Figure 2
Proportion of exonic sequences and representation in whole-exome sequencing (A) Comparison of the proportion of exonic bases with annotations of protein coding, 5′ UTR, 3′ UTR, non-coding RNA, and other (including transposable element gene or pseudogene exons not annotated as protein coding) across six different organisms. (B) A bar plot of the total size of exonic bases in humans with different annotations showing the overlap with whole-exome sequencing capture regions. Bases that are within the capture are shown in color with bases that are not in gray. The raw numbers behind this figure are in Table S1.

References

    1. Krebs J.E., Goldstein E.S., Kilpatrick S.T. Jones & Bartlett Publishers; 2009. Lewin’s GENES X.
    1. Gilbert W. Why genes in pieces? Nature. 1978;271:501. - PubMed
    1. Black D.L. Protein diversity from alternative splicing: A challenge for bioinformatics and post-genome biology. Cell. 2000;103:367–370. - PubMed
    1. Adams J.M., Cory S. Untranslated nucleotide sequence at the 5’-end of R17 bacteriophage RNA. Nature. 1970;227:570–574. - PubMed
    1. Proudfoot N.J., Brownlee G.G. Sequence at the 3’ end of globin mRNA shows homology with immunoglobulin light chain mRNA. Nature. 1974;252:359–362. - PubMed

LinkOut - more resources