Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Feb 25:11:108.
doi: 10.1186/1471-2105-11-108.

SplicerAV: a tool for mining microarray expression data for changes in RNA processing

Affiliations

SplicerAV: a tool for mining microarray expression data for changes in RNA processing

Timothy J Robinson et al. BMC Bioinformatics. .

Abstract

Background: Over the past two decades more than fifty thousand unique clinical and biological samples have been assayed using the Affymetrix HG-U133 and HG-U95 GeneChip microarray platforms. This substantial repository has been used extensively to characterize changes in gene expression between biological samples, but has not been previously mined en masse for changes in mRNA processing. We explored the possibility of using HG-U133 microarray data to identify changes in alternative mRNA processing in several available archival datasets.

Results: Data from these and other gene expression microarrays can now be mined for changes in transcript isoform abundance using a program described here, SplicerAV. Using in vivo and in vitro breast cancer microarray datasets, SplicerAV was able to perform both gene and isoform specific expression profiling within the same microarray dataset. Our reanalysis of Affymetrix U133 plus 2.0 data generated by in vitro over-expression of HRAS, E2F3, beta-catenin (CTNNB1), SRC, and MYC identified several hundred oncogene-induced mRNA isoform changes, one of which recognized a previously unknown mechanism of EGFR family activation. Using clinical data, SplicerAV predicted 241 isoform changes between low and high grade breast tumors; with changes enriched among genes coding for guanyl-nucleotide exchange factors, metalloprotease inhibitors, and mRNA processing factors. Isoform changes in 15 genes were associated with aggressive cancer across the three breast cancer datasets.

Conclusions: Using SplicerAV, we identified several hundred previously uncharacterized isoform changes induced by in vitro oncogene over-expression and revealed a previously unknown mechanism of EGFR activation in human mammary epithelial cells. We analyzed Affymetrix GeneChip data from over 400 human breast tumors in three independent studies, making this the largest clinical dataset analyzed for en masse changes in alternative mRNA processing. The capacity to detect RNA isoform changes in archival microarray data using SplicerAV allowed us to carry out the first analysis of isoform specific mRNA changes directly associated with cancer survival.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Gaussian mixture model of changes in alternative processing. Absolute expression of a hypothetical gene is reported by four independent probesets targeting different regions of this gene; I, II, III, IV (left panels) for control and treatment conditions (open and closed bars respectively). The idealized Gaussian mixture models representing changes in probeset behavior are illustrated in the right panels. Panels A, B, and C represent concordant probeset behaviors corresponding to no change, an increase, and a decrease, respectively. Panel D represents discordant behavior; two probesets (I, II) report an increase, while the remaining probesets (III, IV) report a decrease in expression between conditions (control and treatment). Probesets may report discrepant changes in gene expression depending on which region of the mRNA transcript they interrogate.
Figure 2
Figure 2
HRAS over-expression results in substantial relative isoform changes. (A) Example SplicerAV output comparing HRAS to GFP over-expression. Genes are ranked in order of descending Splice Score (top three genes shown), with EGFR receiving the top score in HRAS over-expression. Log2 fold change in expression and corresponding p-values from two tailed homoskedastic t-test of differential expression are shown for individual probesets targeting each gene. Probesets are placed into A and B groupings by SplicerAV (see text). Splice score, SplicerAV p-value, and two way ANOVA p-values are shown for each gene. (B) Distribution of the 645 isoform changes (AS Events) predicted by SplicerAV (p < .01) upon HRAS over-expression in human primary mammary epithelial cells. For each gene, SplicerAV separates probesets into two similarly behaving groups based on similar fold changes in expression. The average change in expression between probesets in these two groups (AvgChange, see Equation 8 in methods) reflects the relative fold change in isoform abundance predicted by SplicerAV. Absolute relative fold change in isoform abundance is shown in log base 2.
Figure 3
Figure 3
HRAS over-expression causes isoform specific regulation of Epidermal Growth Factor Receptor (EGFR) in human mammary epithelial cells. (A) Probesets on the Affymetrix U133 2.0 plus array interrogate EGFR expression at seven different genomic locations. Up and down arrows indicate each probeset's expression changes in HRAS over-expression compared to GFP controls. Probeset 5 experienced a significant decrease in expression with HRAS over-expression, but was not expressed above background. B) UCSC genomic alignment of probesets and EGFR isoforms. Four previously observed EGFR isoforms (A, B, C and D) are shown with exons represented as black boxes and introns as hashed lines. Extracellular, transmembrane, and intracellular domain regions are shown below the alignment. C-F) Scatter plots of logged expression levels of all 55 samples (GFP, MYC, SRC, CTNNB1, E2F3, and HRAS) for selected pairs of probesets C) Probesets 1 and 2 target a transcript region common to all major isoforms and exhibit highly correlated expression (R2 = .95). D) Probesets 1 and 3 target the common region vs. isoform B specific region and demonstrate a weak inverse relationship (R2 = .36). E) Probesets 1 and 6 interrogate the common vs. AShort isoform region, demonstrating a high degree of correlation across all samples (R2 = .87). F) In contrast, probesets 1 and 7 interrogate common and ALong isoform region and are not correlated (R2 = .01) due to the HRAS induced 3'UTR shortening of EGFR A transcripts.
Figure 4
Figure 4
SplicerAV Index of ARHGEF7 is associated with breast cancer survival. Panel A. Schematic representation of ARHGEF7 isoforms A, B and C, with regions interrogated by probesets that increase shown as Probesets Up 1 and 2 (red arrows), and the region which decreases denoted as Probeset Down (blue arrow). Panel B. The fraction of patients surviving in each cohort (vertical axis) is shown over time in years (horizontal axis) as a function of individual probeset expression or SplicerAV index. Survival of patients in the top (red line) and bottom (blue line) 50th percentile are plotted by individual probeset expression (Down, UP1, and UP2) and the SplicerAV index within the Miller (left) and Pawitan (right) cohorts. Results of two-tailed logrank tests of survival are shown, with asterisks indicating significance at the .05 (large asterisk) and .10 (small asterisk) levels.
Figure 5
Figure 5
EIF4E2 probesets are associated with breast cancer survival. Panel A. Schematic representation of EIF4E2 isoforms A and B, with region interrogated by probesets shown as Up (red arrow), and Down (blue arrow). For panels B, C, and D, the fraction of patients surviving in each cohort (vertical axis) is shown over time in years (horizontal axis) as a function of individual probeset expression or SplicerAV index. Survival of patients in the top (red line) and bottom (blue line) 50th percentile are plotted by individual probeset expression (B, C) and the SplicerAV index (D) within the Miller (left) and Pawitan (right) cohorts. Results of two-tailed logrank tests of survival are shown, with asterisks indicating significance at the .05 level.
Figure 6
Figure 6
A six isoform signature provides improved prediction of breast cancer survival compared to individual isoforms. The fraction of patients surviving in each cohort (vertical axis) is shown over time in years (horizontal axis) as a function of individual probeset expression or SplicerAV index. Survival of patients in the top (red line) and bottom (blue line) 50th percentile are plotted by the SplicerAV index for six genes; EIF4E2 (A), ARHGEF7 (B), SLC28A10 (C), PDXK (D), TncRNA (E), MAPKAP1 (F) for the Miller (left) and Pawitan (right) cohorts. Patients survival stratified by a low (0-1), intermediate (2-4), and high (5-6) number of poor prognostic events is shown in panel G.

Similar articles

Cited by

References

    1. Blencowe BJ. Alternative splicing: new insights from global analyses. Cell. 2006;126(1):37–47. doi: 10.1016/j.cell.2006.06.023. - DOI - PubMed
    1. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008;456(7221):470–476. doi: 10.1038/nature07509. - DOI - PMC - PubMed
    1. Krawczak M, Reiss J, Cooper DN. The mutational spectrum of single base-pair substitutions in mRNA splice junctions of human genes: causes and consequences. Hum Genet. 1992;90(1-2):41–54. doi: 10.1007/BF00210743. - DOI - PubMed
    1. Lopez-Bigas N, Audit B, Ouzounis C, Parra G, Guigo R. Are splicing mutations the most frequent cause of hereditary disease? FEBS Lett. 2005;579(9):1900–1903. doi: 10.1016/j.febslet.2005.02.047. - DOI - PubMed
    1. Garcia-Blanco MA, Baraniak AP, Lasda EL. Alternative splicing in disease and therapy. Nat Biotechnol. 2004;22(5):535–546. doi: 10.1038/nbt964. - DOI - PubMed

Publication types