Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan 25;46(2):582-592.
doi: 10.1093/nar/gkx1165.

Alternative start and termination sites of transcription drive most transcript isoform differences across human tissues

Affiliations

Alternative start and termination sites of transcription drive most transcript isoform differences across human tissues

Alejandro Reyes et al. Nucleic Acids Res. .

Abstract

Most human genes generate multiple transcript isoforms. The differential expression of these isoforms can help specify cell types. Diverse transcript isoforms arise from the use of alternative transcription start sites, polyadenylation sites and splice sites; however, the relative contribution of these processes to isoform diversity in normal human physiology is unclear. To address this question, we investigated cell type-dependent differences in exon usage of over 18 000 protein-coding genes in 23 cell types from 798 samples of the Genotype-Tissue Expression Project. We found that about half of the expressed genes displayed tissue-dependent transcript isoforms. Alternative transcription start and termination sites, rather than alternative splicing, accounted for the majority of tissue-dependent exon usage. We confirmed the widespread tissue-dependent use of alternative transcription start sites in a second, independent dataset, Cap Analysis of Gene Expression data from the FANTOM consortium. Moreover, our results indicate that most tissue-dependent splicing involves untranslated exons and therefore may not increase proteome complexity. Thus, alternative transcription start and termination sites are the principal drivers of transcript isoform diversity across tissues, and may underlie the majority of cell type specific proteomes and functions.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Quantification of exon usage. (A) Exemplary gene model in the reference genome (green) and alignments of RNA-seq reads (upper panel). Sequenced fragments whose alignments fall fully into an exonic region are shown by a gray box; alignments that map into two (or more) exonic regions are shown by shorter gray boxes connected by a horizontal line. For a particular exon (highlighted in orange), we consider two strategies to quantify its usage, as illustrated in Panels (B and C) (see ‘Materials and Methods’ section for the formal description). The first strategy is illustrated in Panel B, where sequenced fragments are counted into two groups: those that map fully or partially to the exon (λ) and those that map to the rest of the exons (ϵ). θREUC is defined as the ratio between λ and ϵ, and the REUC for the exon in sample j is computed as the ratio between θREUC in that sample to the mean θREUC across all samples. Panel C illustrates the second strategy, where sequenced fragments are also counted into two groups: those that map fully or partially to the exon (λ) and those that align to exons both downstream and upstream of the exon under consideration (ρ). The latter represent transcripts from which the exon was spliced out. θRSIC is then defined as the ratio between λ and ρ. The relative spliced-in coefficient (RSIC) for the exon in sample j is the ratio of θRSIC in this sample to the mean θRSIC across all samples. Note that while differences in exon usage due to alternative splicing are reflected in both REUCs and RSICs, differences due to alternative transcription or termination are only reflected in REUCs. (D) Heatmap representations of the REUCs for three exonic regions (E004, E005 and E006) of the gene 5-Aminolevulinate Synthase 1, computed using subset A of the GTEx data. The rows of the heatmaps correspond to the eight tissues, and each column corresponds to one individual. The horizontal color patterns of exon E005 indicate elevated inclusion of cerebellum and cerebellar cortex as compared to the rest of the brain cell types. (E) RNA-seq samples from two cell types (cortex and cerebellum) from individual 12ZZX (also indicated by the arrows below each heatmap in Figure 1D) are displayed as sashimi plots. The three exonic regions presented in Panel D are shown. The middle exon, E005, is an untranslated cassette exon (ENSEMBL identifier ENSE00002267562) that is spliced out more frequently in cortex than in cerebellum.
Figure 2.
Figure 2.
Tissue-dependent exon usage is widespread in the human genome. Panels (AC) show data from subset A of the GTEx data. The same plots using data from subsets B and C can be found in Supplementary Figures S12 and S14. (A) Similar to a volcano-plot, this figure shows statistical significance (P-value on −log10 scale) versus effect size (tissue score) of our tissue-dependence test for each exonic region of the human genome. The solid red lines show the thresholds used in this study to call an exonic region tissue-dependent. The P-value threshold 4.28 · 10−2 corresponds to an adjusted P-value of 0.1 according to the Benjamini–Hochberg method to control FDR. (B) Histogram of the fraction of exonic regions within each gene that are subject to TDU (X-axis). The Y-axis shows the number of genes. (C) Similar to Panel B, but expressed in terms of fraction of base-pairs within a gene affected by TDU. (D) Exemplary data from four out of nine tissues of individual 131XE from subset B. Shown is RNA-seq coverage (Y-axis) plots along genomic coordinates (X-axis) at the locus of the gene EPB41L4B on chromosome 9. The lower panel shows the transcript annotations for this gene. Skin and thyroid express short isoforms, while tibial nerve and skeletal muscle express longer isoforms.
Figure 3.
Figure 3.
Alternative splicing underlies only a minor fraction of exons with TDU, while the rest are consistent with alternative transcription start or stop sites. The three panels show data from subset A of the GTEx data. Analogous plots for subsets B and C are shown in Supplementary Figure S17. (A) The heights of the bars show the number of exonic regions with TDU, grouped according to the number of reads that support their splicing out from transcripts. Most exonic regions with TDU have either no or weak evidence of being spliced out from transcripts (bar colored in pink salmon). The bar colors serve also as color legends for Figure 3B and C. (B) Each point represents one of the 47 659 exonic regions that were detected to be used in a tissue-dependent manner. The X-axis shows the fraction of REUC variance that is attributed to variance between tissues (R2). Analogously, the Y-axis shows the R2 statistic for the RSICs. Exonic regions with strong evidence of being spliced out from transcripts (purple points) lay along the diagonal. (C) Cumulative distribution functions of the Pearson correlation coefficients between the REUCs and the RSICs are shown for exonic regions with TDU. The regions are stratified according to the number of sequenced fragments supporting their splicing out from transcripts. The REUCs and RISCs are highly correlated for the minor fraction of exons that have strong evidence of being spliced out from transcripts (purple line).
Figure 4.
Figure 4.
Integration of RNA-seq and CAGE data. Each panel displays an example of a gene where the usage of alternative transcription start sites explains the patterns of TDU. (A) Coverage tracks (Y-axes) of RNA-seq and CAGE data for cerebral cortex and cerebellum are shown along the genomic coordinates (X-axis) of the locus of gene GAS7, located on chromosome 17. The upper two tracks show RNA-seq data from individual 12ZZX. The lower two tracks show mean CAGE counts (on log2 scale) for each annotated TSS. Cortex uses two transcription start site clusters (see red arrows) that are absent in cerebellum. The differential usage of these two TSS explains the upstream RNA-seq coverage seen in cortex. (B) Analogous to Figure 4A, showing data of thyroid and subcutaneous adipose tissue along the genomic coordinates of the KRT8 locus on chromosome 12. The RNA-seq data are from individual 11EI6. The internal TSS cluster that is indicated by the red arrow is strongly used in thyroid tissue, resulting in the expression of short transcript isoforms. (C) Same as in Figure 4A, but showing data of heart and pancreas along the genomic coordinates of the NEBL locus on chromosome 10. The RNA-seq data corresponds to the individual ZF29. In heart, the usage of an internal TSS (indicated by the red arrow) results in the expression of transcript isoforms that exclude several 5′ exons of the gene.
Figure 5.
Figure 5.
Alternative splicing is infrequent among coding exons. (A) The percentage of exonic regions (Y-axis) is shown for three subsets of exons: (i) exonic regions with TDU due to alternative splicing [DEU (AS)], (ii) exonic regions with TDU without evidence of alternative splicing [DEU (NAS)] and (iii) a background set of exons matched for expression and exon width. Each color represents a different category of exons according to transcript biotypes: exons coding for principal transcript isoforms [Coding (PI)], exons coding for non-principal transcript isoforms [Coding (non-PI)], 5′ UTRs, 3′ UTRs and exons from non-coding processed transcripts [Processed transcripts]. (B) Sashimi plot representation of the RNA-seq data from frontal cortex and cerebellum of individual WL46. The lower data track shows the transcript isoforms of the gene PKD1. The transcripts are colored according to their biotype (the color legend is the same as in Figure 5A). The highlighted exon (E051) belongs to a non-coding transcript and is differentially spliced across tissues. (C) Same as in Figure 5B, but showing data from tibial artery and whole blood of the individual ZTPG. Transcripts from the gene MAN2B2 along chromosome 4 are shown. The highlighted exon (E018) belongs to a non-coding transcript and is differentially spliced across tissues. (D) Same as in Figure 5B, but showing data from esophagus tissue (muscularis) and heart tissue (left ventricle) of the individual 111YS. The lower track shows the transcripts annotated for gene NISCH along chromosome 3. The highlighted exon (E009) belongs to a non-coding transcript and is differentially spliced across tissues.

Similar articles

Cited by

References

    1. de Klerk E., ’t Hoen P.A.. Alternative mRNA transcription, processing, and translation: insights from RNA sequencing. Trends Genet. 2015; 31:128–139. - PubMed
    1. Breitbart R.E., Andreadis A., Nadal-Ginard B.. Alternative splicing: a ubiquitous mechanism for the generation of multiple protein isoforms from single genes. Annu. Rev. Biochem. 1987; 56:467–495. - PubMed
    1. Keren H., Lev-Maor G., Ast G.. Alternative splicing and evolution: diversification, exon definition and function. Nat. Rev. Genet. 2010; 11:345–355. - PubMed
    1. Carninci P., Sandelin A., Lenhard B., Katayama S., Shimokawa K., Ponjavic J., Semple C.A.M., Taylor M.S., Engström P.G., Frith M.C. et al. . Genome-wide analysis of mammalian promoter architecture and evolution. Nat. Genet. 2006; 38:626–635. - PubMed
    1. Tian B., Manley J.L.. Alternative polyadenylation of mRNA precursors. Nat. Rev. Mol. Cell Biol. 2016; 18:18–30. - PMC - PubMed

Publication types