Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Sep;37(16):e107.
doi: 10.1093/nar/gkp508. Epub 2009 Jun 15.

Overestimation of alternative splicing caused by variable probe characteristics in exon arrays

Affiliations

Overestimation of alternative splicing caused by variable probe characteristics in exon arrays

Dimos Gaidatzis et al. Nucleic Acids Res. 2009 Sep.

Abstract

In higher eukaryotes, alternative splicing is a common mechanism for increasing transcriptome diversity. Affymetrix exon arrays were designed as a tool for monitoring the relative expression levels of hundreds of thousands of known and predicted exons with a view to detecting alternative splicing events. In this article, we have analyzed exon array data from many different human and mouse tissues and have uncovered a systematic relationship between transcript-fold change and alternative splicing as reported by the splicing index. Evidence from dilution experiments and deep sequencing suggest that this effect is of technical rather than biological origin and that it is driven by sequence features of the probes. This effect is substantial and results in a 12-fold overestimation of alternative splicing events in genes that are differentially expressed. By cross-species exon array comparison, we could further show that the systematic bias persists even across species boundaries. Failure to consider this effect in data analysis would result in the reproducible false detection of apparently conserved alternative splicing events. Finally, we have developed a software in R called COSIE (Corrected Splicing Indices for Exon arrays) that for any given set of new exon array experiments corrects for the observed bias and improves the detection of alternative splicing (available at www.fmi.ch/groups/gbioinfo).

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Relationship between transcript-fold change and the splicing index in pairwise sample comparisons. (A) MA plots for eight genes when comparing mouse embryo with mouse brain RNA. The average log2 expression is depicted on the x-axis and the log2 expression change on the y-axis. Every dot represents one probeset. The error bars are computed from four technical replicates. (B) A scatter plot comparing the log2 gene expression change (x-axis) with the standard deviation of splicing indices from probesets of that gene (y-axis). To reduce array boundary effects, all the nonexpressed probesets, as well as all genes with very low and very high expression levels, were removed. The same quantity is shown in (C) mouse and human (D), for all 55 pairwise comparisons between any two of the 11 array samples from the tissue panel data. Each comparison is represented as a smoothed interpolation curve with asterisks at its ends. In all the figure parts, only the core annotation sets of the human and mouse exon arrays were used.
Figure 2.
Figure 2.
Probeset response characteristics. (A) The behavior of all probesets from one example gene (matrillin 4) across 11 mouse tissues (Affymetrix tissue panel data) in log–log space. The x-axis represents the expression of the gene, whereas the y-axis represents the splicing index of the probeset (determined by comparing each tissue to the average of all tissues). Individual probesets were sorted by their slope and shifted along the y-axis to avoid overlaps. The black line at the top illustrates the scale of the y-axis and represents four units in splicing index space. Mouse (B) and human (C) are condensed representations showing the behavior of many more probesets. Depicted is a hierarchical clustering of probesets that have high splicing index variability and reside in genes that change their expression in the 11 tissues (expressionmax − expressionmin > 3). Each row represents one probeset, the x-axis denotes gene expression rank amongst the 11 tissues (sorted independently for each exon) and the splicing index is color coded (red: ≥2.67, blue: ≤−2.67). (D) A more refined picture of individual probeset behavior based on another dataset comprising 57 lymphoblastoid cell line samples in triplicates (14). Eighteen genes are randomly selected from differentially expressed genes (expressionmax − expressionmin > 4). For each selected gene the behavior of the highest (black) and the lowest (red) responding probeset (based on the slope of a linear regression) is depicted.
Figure 3.
Figure 3.
Probeset slope predictor based on a positional dinucleotide model. (A) The distributions of log2 probeset-fold changes for the mouse dilution experiment at three different dilutions. Each dilution condition (50%, 25% and 10%) was compared with the 100% mouse RNA array. To eliminate array boundary effects (low and high expression), we only considered a very narrow band of probe expression (between 6 and 8.5 in the 100% mouse RNA experiment). The triangles represent the expression change expected from the experiment. In log2 space this would be −1 for 50%, −2 for 25% and −3.3 for 10% dilution. (B and C) The positional contributions of individual dinucleotides after training a linear model with experimentally measured probe slopes from the dilution experiments. Position 24 represents the end of the probe that is attached to the surface of the microarray for mouse and human, respectively. Red color denotes positive contribution to the slope (overresponding) whereas blue denotes negative contribution (underresponding). The scatterplots in (D and E) show a comparison of the probeset response slopes either predicted by the dinucleotide model (x-axis) or determined by linear regression from the Affymetrix tissue panel data (y-axis) for mouse and human, respectively. Probe response slopes (predicted from the dinucleotide model) were converted to probeset response slopes by averaging the slopes of all the probes that belong to the probeset in question.
Figure 4.
Figure 4.
Relationship between transcript-fold change and the splicing index in deep sequencing data. (A and B) scatter plots derived from deep sequencing data (9) comparing the log2 gene expression change (x-axis) to the standard deviation of the splicing indices belonging to exons of that particular gene (y-axis).
Figure 5.
Figure 5.
Cross-species exon array comparison. (A) A scatter plot comparing mouse to human probeset response slopes (determined from the tissue panel data). Mouse probesets were linked to their human homolog counterparts using whole-genome alignments (see Methods section). (B) A comparison of mouse and human exon response slopes based on the predictions from the positional dinucleotide model.
Figure 6.
Figure 6.
Correcting probeset response behavior. (A) A scatterplot, comparing probeset response slopes estimated by linear regression from a training set of 250 arrays (covering a large variety of tissues) to those from an independent set of 171 lymphoblastoid cell line samples (14). (C and D) The dependency between gene-fold change and splicing index variation (human universal RNA versus human brain RNA), before (C) and after (D) applying a correction based on a linear model using probeset slopes derived from the training set (250 arrays). (B) The number of significant alternative probesets detected in the uncorrected data compared with the corrected data using the linear and the nonlinear model (t-test for every probeset based on 10 replicates, Pcutoff < 10−5). The bar-groups represent equally sized populations of genes binned according to absolute gene-fold change.

Similar articles

Cited by

References

    1. Clark TA, Sugnet CW, Ares M. Genomewide analysis of mRNA processing in yeast using splicing-specific microarrays. Science. 2002;296:907–910. - PubMed
    1. Cline MS, Blume J, Cawley S, Clark TA, Hu JS, Lu G, Salomonis N, Wang H, Williams A. ANOSVA: a statistical method for detecting splice variation from expression data. Bioinformatics. 2005;21(Suppl. 1):i107–i115. - PubMed
    1. Cheung HC, Baggerly KA, Tsavachidis S, Bachinski LL, Neubauer VL, Nixon TJ, Aldape KD, Cote GJ, Krahe R. Global analysis of aberrant pre-mRNA splicing in glioblastoma using exon expression arrays. BMC Genomics. 2008;9:216. - PMC - PubMed
    1. Xing Y, Stoilov P, Kapur K, Han A, Jiang H, Shen S, Black DL, Wong WH. MADS: a new and improved method for analysis of differential alternative splicing by exon-tiling microarrays. RNA. 2008;14:1470–1479. - PMC - PubMed
    1. Li C, Wong WH. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc. Natl Acad. Sci. USA. 2001;98:31–36. - PMC - PubMed

Publication types