Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar 13;11(3):e1004105.
doi: 10.1371/journal.pcbi.1004105. eCollection 2015 Mar.

Transcriptome sequencing reveals potential mechanism of cryptic 3' splice site selection in SF3B1-mutated cancers

Affiliations

Transcriptome sequencing reveals potential mechanism of cryptic 3' splice site selection in SF3B1-mutated cancers

Christopher DeBoever et al. PLoS Comput Biol. .

Abstract

Mutations in the splicing factor SF3B1 are found in several cancer types and have been associated with various splicing defects. Using transcriptome sequencing data from chronic lymphocytic leukemia, breast cancer and uveal melanoma tumor samples, we show that hundreds of cryptic 3' splice sites (3'SSs) are used in cancers with SF3B1 mutations. We define the necessary sequence context for the observed cryptic 3' SSs and propose that cryptic 3'SS selection is a result of SF3B1 mutations causing a shift in the sterically protected region downstream of the branch point. While most cryptic 3'SSs are present at low frequency (<10%) relative to nearby canonical 3'SSs, we identified ten genes that preferred out-of-frame cryptic 3'SSs. We show that cancers with mutations in the SF3B1 HEAT 5-9 repeats use cryptic 3'SSs downstream of the branch point and provide both a mechanistic model consistent with published experimental data and affected targets that will guide further research into the oncogenic effects of SF3B1 mutation.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Proximal cryptic 3’SSs used significantly more often in cancers with SF3B1 hotspot mutations.
log2 distance in base pairs from associated canonical 3’SSs to (A) 1,117 significantly differentially used novel 3’SSs, (B) 16,673 novel 3’SSs with canonical intron motifs (GT/AG) used more highly in the mutants but not significant, and (C) 18,660 novel 3’SSs with canonical intron motifs (GT/AG) used more highly in the wild-types but not significant. Zero represents the position of the canonical 3’SS. Negative and positive distances indicate that the cryptic 3’SS is respectively upstream or downstream from the canonical 3’SS. Inset in (A) shows base-by-base binning from zero to 50 base pairs upstream of canonical 3’SS. Red and blue histograms represent junctions with significantly higher usage in SF3B1 mutants or SF3B1 wild-type samples, respectively. (D) Upper red and blue heatmap shows for each sample the log2 library-normalized count z-score for 619 cryptic 3’SSs used significantly more often in the SF3B1 mutants and located 10–30 bp upstream of canonical 3’SSs (DEXSeq, BH-adjusted p < 0.1). Grey bars at left indicate frequency of SF3B1 mutant allele in RNA-seq data. Colorbars indicate SF3B1 mutation status, cancer type, and whether the SF3B1 mutation is located in the HEAT 5–9 repeats. Black and white colorbar indicates whether novel 3’SSs are out-of-frame (black) relative to canonical 3’SSs. Bottom green heatmap shows relative expression levels for the genes containing each cryptic 3’SS. We calculated the average expression of each gene in each cancer type and normalized by the maximum expression for each gene so that the maximum value in each column is one (see Methods). Cryptic 3’SSs not observed in all cancer types tend to have differing gene expression levels between cancers. (E) Locations and frequency of SF3B1 mutations in HEAT repeats 5–9. Mutations observed more than once in COSMIC (upper axis) cluster in ~10 amino acid hotspots in each HEAT repeat; most frequent mutation in each hotspot is labeled. Bottom axis shows locations and frequency of mutations in our study. BRCA samples with A663V and Y765C mutations do not show evidence for cryptic 3’SS selection.
Fig 2
Fig 2. 3’ intron nucleotide composition for control, associated canonical, and cryptic 3’SSs.
(A) We identified 23,066 control 3’SSs whose junctions had a mean coverage greater than 100 reads over all CLL, BRCA, and UM samples to compare to the cryptic and associated canonical 3’SSs. Nucleotide frequency for the last 50 bp of the intron for (B) 23,066 control 3’SSs; (C) 613 associated canonical 3’SSs; (D) 619 proximal cryptic 3’SSs; and (E) 417 distal cryptic 3’SSs. Bar plots above each nucleotide composition plot are log10 p-values from Fisher exact tests for enrichment of adenines at each position relative to control 3’SSs. Horizontal line marks significance level of p = 0.05. (-log10 0.05 ≈ 1.3). The p-value box plots have different scales in (C), (D), and (E); the smallest p-values for each panel are labeled.
Fig 3
Fig 3. Location of predicted branch point relative to cryptic and canonical 3’SSs and model of cryptic 3’SS selection.
(A) Distance from highest scoring BP predicted for associated canonical 3’SSs to the corresponding proximal cryptic 3’SSs. A negative distance indicates that the cryptic 3’SS is upstream of the BP predicted for the canonical 3’SS. The small spike at 2 bp indicates that in a few cases the adenine in the cryptic 3’SS is predicted to be the BP adenine for the canonical 3’SS. (B) Distance from highest scoring BP predicted for control 3’SSs to downstream intronic AG dinucleotides that are not annotated as 3’SSs. (C) Distance from either highest or second highest scoring BP predicted for canonical 3’SSs to their associated cryptic 3’SSs (see Methods). (D) Model for proximal cryptic 3’SS selection in SF3B1 mutants. yTnAy is the human BP motif. AG dinucleotides located at the edge of the sterically protected region can be used as 3’SSs in SF3B1 mutants (star). AG dinucleotides located in the protected or competitive regions (X’s) are respectively sterically hindered from being selected as 3’SSs or out-competed by the canonical 3’SS. Distance from predicted BP to 3’SS for (E) associated canonical 3’SSs and (F) control 3’SSs (see Methods) is significantly different (p < 10-23, Mann Whitney U).
Fig 4
Fig 4. Percent spliced in for cryptic 3’ splice sites in CLL analysis.
(A) Heatmap shows the percent spliced in (PSI) values for cryptic 3’SS relative to the canonical 3’SS in CLL SF3B1 mutated or wild-type samples for 325 proximal cryptic 3’SSs used significantly more often in the CLL mutants (DEXSeq, BH-adjusted p < 0.1). SF3B1 mutation presence and the status of prognostic factors IGHV and ZAP70 are shown in left colorbars. Black and white colorbar indicates whether novel 3’SSs are out-of-frame (black) relative to canonical 3’SSs. In-frame and out-of-frame cryptic 3’SSs are used at similar rates relative to their associated canonical 3’SSs. (B) Beeswarm plots indicating the PSI values for the cryptic 3’SS relative to the associated canonical 3’SS in ten genes with high levels of cryptic 3’SS inclusion in CLL SF3B1 mutants (M) compared to wild-type (W) samples. No reads were observed spanning the cryptic YIF1A junction in any wild-type CLL samples. The number in the upper corner of each plot is the distance in base pairs from the highest or second-highest scoring BP predicted for the associated canonical 3’SS to the cryptic 3’SS.

References

    1. Watson IR, Takahashi K, Futreal PA, Chin L (2013) Emerging patterns of somatic mutations in cancer. Nature reviews Genetics. - PMC - PubMed
    1. Wan Y, Wu CJ (2013) SF3B1 mutations in chronic lymphocytic leukemia. Blood 121: 4627–4634. 10.1182/blood-2013-02-427641 - DOI - PMC - PubMed
    1. Biankin AV, Waddell N, Kassahn KS, Gingras MC, Muthuswamy LB, et al. (2012) Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes. Nature 491: 399–405. 10.1038/nature11547 - DOI - PMC - PubMed
    1. Martin M, Masshofer L, Temming P, Rahmann S, Metz C, et al. (2013) Exome sequencing identifies recurrent somatic mutations in EIF1AX and SF3B1 in uveal melanoma with disomy 3. Nature genetics 45: 933–U296. 10.1038/ng.2674 - DOI - PMC - PubMed
    1. Yoshida K, Sanada M, Shiraishi Y, Nowak D, Nagata Y, et al. (2011) Frequent pathway mutations of splicing machinery in myelodysplasia. Nature 478: 64–69. 10.1038/nature10496 - DOI - PubMed

Publication types