Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct 23;9(4):393-400.e6.
doi: 10.1016/j.cels.2019.07.011. Epub 2019 Sep 18.

mountainClimber Identifies Alternative Transcription Start and Polyadenylation Sites in RNA-Seq

Affiliations

mountainClimber Identifies Alternative Transcription Start and Polyadenylation Sites in RNA-Seq

Ashley A Cass et al. Cell Syst. .

Abstract

Alternative transcription start (ATS) and alternative polyadenylation (APA) create alternative RNA isoforms and modulate many aspects of RNA expression and protein production. However, ATS and APA remain difficult to detect in RNA sequencing (RNA-seq). Here, we developed mountainClimber, a de novo cumulative-sum-based approach to identify ATS and APA as change points. Unlike many existing methods, mountainClimber runs on a single sample and identifies multiple ATS or APA sites anywhere in the transcript. We analyzed 2,342 GTEx samples (36 tissues, 215 individuals) and found that tissue type is the predominant driver of transcript end variations. 75% and 65% of genes exhibited differential APA and ATS across tissues, respectively. In particular, testis displayed longer 5' untranslated regions (UTRs) and shorter 3' UTRs, often in genes related to testis-specific biology. Overall, we report the largest study of transcript ends across human tissues to our knowledge. mountainClimber is available at github.com/gxiaolab/mountainClimber.

Keywords: GTEx; RNA-seq; alternative polyadenylation; alternative transcription start site; change point; human; tissues.

PubMed Disclaimer

Conflict of interest statement

Declaration of Interests

Ashley Cass is recently employed at Ambry Genetics Corporation as a Bioinformatics Scientist.

Figures

Figure 1.
Figure 1.. mountainClimber pipeline schematic and performance evaluation.
(A-E) The mountainClimber approach for identifying change points in each transcription unit (TU) in each sample. Example simulated RNA-seq is shown for CDO1 (NM_033037), for which two change points are simulated. Simulated transcript isoform models are shown in yellow below each figure panel. (A) Identify de novo TUs. Poly(A)-selected RNA-seq is shown with the TU predicted shown in a blue line below. (B) Calculate cumulative read sum (CRS) as a function of position (black). The null distribution (diagonal line) is shown in grey. (C) Identify elbows in the CRS distribution by calculating the distance from the CRS (black) and weighted CRS (wCRS, green) to the diagonal line y = x. Note that wCRS is needed to observe elbows corresponding to both exon-intron junctions for each exon (black vs. green). (D) Filter putative change points by fold change and t-test. Dashed grey lines indicate zooming in to the last exon of the gene. Red lines indicate change points identified after filtering. (E) Calculate relative usage (RU) based on the average reads per bp in each segment at each end such that RUs sum to 1 at each transcript end. (F) The different types of ATS and APA identified by mountainClimber. ATS and APA cases are colored green and blue respectively. (G-H) Performance on simulated RNA-seq and 3’ ends. Fold change was calculated as the average reads/bp of proximal vs. distal segments. (G) Precision stratified by the fold change at predicted change points (CPs) (non-overlapping stratifications). (H) Recall stratified by fold change at true simulated CPs. (I-N) Performance on MAQC RNA-seq. (I) Precision for each window size w, where precision is calculated as the fraction of predicted change points that fell within w bp of any PolyA-seq site. (J) mountainClimber 3’ predictions relative to PolyA-seq sites. Predicted 3’ ends that are within 300bp of any PolyA-seq site (n = 30,134) were stratified by fold change into 5 bins. The x-axis indicates the position of the closest PolyA-seq site relative to each predicted poly(A) site, where positive (negative) values indicate the PolyA-seq site is downstream (upstream) of the prediction. The y-axis indicates the number of predictions at the corresponding position (x-axis) that have PolyA-seq support within +/−20bp. (K) Similar to (J), but for IsoSCM (n = 21,400). (L) Similar to (I), but for FANTOM CAT TSS. (M) Similar to (J), for the 5’ end, comparing mountainClimber and FANTOM CAT sites (n = 28,350). (N) Similar to (M), but for IsoSCM (n = 31,961). For more details, see STAR Methods and Figures S1-S4.
Figure 2.
Figure 2.. Landscape of alternative transcription start and polyadenylation sites in human tissues.
(A) Percentage of genes with APA (blue) or ATS (green) detected in x randomly chosen tissues (x = 1 to 36) across 500 iterations (STAR Methods). Mean and standard deviation are shown. (B) Variations in weighted mean extension length (WMEL) at the 5’ end attributed to individuals or tissues (STAR Methods). Numbers of genes below and above the dashed line y = x are shown. (C) Similar to (B), but for the 3’ end. (D) Number of significantly differential change points identified in each pairwise comparison in the 5’ end (BH-corrected p-value <= 0.05 and absolute RU difference >= 0.05). (E) Similar to (D), but for the 3’ end. (F-I) Examples of alternative 5’ and 3’ ends. The range shown contains the upstream and downstream segment of the differential change point(s) (i.e. the entire 5’ or 3’ end is not necessarily shown). Each line indicates the read counts for one individual at each nucleotide, and change points are indicated by black dashed lines. Genomic position is shown in grey (mb = megabase). Ensembl annotations are shown in yellow. (F) APA in APH1B in testis vs. small intestine (p = 2.86e-251, RU difference = −0.524). (G) ATS in CPNE5 in cortex vs. atrial appendage (p = 3.36e-256, RU difference = 0.60). (H) Intronic APA in ABCF2 in testis vs. uterus (p = 2.47e-54, RU difference = 0.23). (I) Three APA change points in UBE2J1 in breast vs. testis. See also Figures S5-S11.

References

    1. Arefeen A, Liu J, Xiao X, and Jiang T (2018). TAPAS: tool for alternative polyadenylation site analysis. Bioinformatics 34, 2521–2529. - PMC - PubMed
    1. Baek D, Davis C, Ewing B, Gordon D, and Green P (2007). Characterization and predictive discovery of evolutionarily conserved mammalian alternative promoters. Genome Res. 17, 145–155. - PMC - PubMed
    1. Bates D, Mächler M, Bolker B, and Walker S (2015). Fitting Linear Mixed-Effects Models Using lme4. J. Stat. Softw. 67, 1–48.
    1. Berg MG, Singh LN, Younis I, Liu Q, Pinto AM, Kaida D, Zhang Z, Cho S, Sherrill-Mix S, Wan L, et al. (2012). U1 snRNP determines mRNA length and regulates isoform expression. Cell 150, 53–64. - PMC - PubMed
    1. Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, et al. (2005). The transcriptional landscape of the mammalian genome. Science 309, 1559–1563. - PubMed

Publication types