. 2009 Sep;37(16):e107.

doi: 10.1093/nar/gkp508. Epub 2009 Jun 15.

Overestimation of alternative splicing caused by variable probe characteristics in exon arrays

Dimos Gaidatzis¹, Kirsten Jacobeit, Edward J Oakeley, Michael B Stadler

Affiliations

PMID: 19528075
PMCID: PMC2760813
DOI: 10.1093/nar/gkp508

Overestimation of alternative splicing caused by variable probe characteristics in exon arrays

Dimos Gaidatzis et al. Nucleic Acids Res. 2009 Sep.

. 2009 Sep;37(16):e107.

doi: 10.1093/nar/gkp508. Epub 2009 Jun 15.

Authors

Dimos Gaidatzis¹, Kirsten Jacobeit, Edward J Oakeley, Michael B Stadler

Affiliation

¹ Friedrich Miescher Institute for Biomedical Research, Novartis Research Foundation, Maulbeerestrasse 66, CH-4058 Basel, Switzerland.

PMID: 19528075
PMCID: PMC2760813
DOI: 10.1093/nar/gkp508

Abstract

In higher eukaryotes, alternative splicing is a common mechanism for increasing transcriptome diversity. Affymetrix exon arrays were designed as a tool for monitoring the relative expression levels of hundreds of thousands of known and predicted exons with a view to detecting alternative splicing events. In this article, we have analyzed exon array data from many different human and mouse tissues and have uncovered a systematic relationship between transcript-fold change and alternative splicing as reported by the splicing index. Evidence from dilution experiments and deep sequencing suggest that this effect is of technical rather than biological origin and that it is driven by sequence features of the probes. This effect is substantial and results in a 12-fold overestimation of alternative splicing events in genes that are differentially expressed. By cross-species exon array comparison, we could further show that the systematic bias persists even across species boundaries. Failure to consider this effect in data analysis would result in the reproducible false detection of apparently conserved alternative splicing events. Finally, we have developed a software in R called COSIE (Corrected Splicing Indices for Exon arrays) that for any given set of new exon array experiments corrects for the observed bias and improves the detection of alternative splicing (available at www.fmi.ch/groups/gbioinfo).

PubMed Disclaimer

Figures

**Figure 1.**
Relationship between transcript-fold change and the splicing index in pairwise sample comparisons. (A) MA plots for eight genes when comparing mouse embryo with mouse brain RNA. The average log2 expression is depicted on the x-axis and the log2 expression change on the y-axis. Every dot represents one probeset. The error bars are computed from four technical replicates. (B) A scatter plot comparing the log2 gene expression change (x-axis) with the standard deviation of splicing indices from probesets of that gene (y-axis). To reduce array boundary effects, all the nonexpressed probesets, as well as all genes with very low and very high expression levels, were removed. The same quantity is shown in (C) mouse and human (D), for all 55 pairwise comparisons between any two of the 11 array samples from the tissue panel data. Each comparison is represented as a smoothed interpolation curve with asterisks at its ends. In all the figure parts, only the core annotation sets of the human and mouse exon arrays were used.

**Figure 2.**
Probeset response characteristics. (A) The behavior of all probesets from one example gene (matrillin 4) across 11 mouse tissues (Affymetrix tissue panel data) in log–log space. The x-axis represents the expression of the gene, whereas the y-axis represents the splicing index of the probeset (determined by comparing each tissue to the average of all tissues). Individual probesets were sorted by their slope and shifted along the y-axis to avoid overlaps. The black line at the top illustrates the scale of the y-axis and represents four units in splicing index space. Mouse (B) and human (C) are condensed representations showing the behavior of many more probesets. Depicted is a hierarchical clustering of probesets that have high splicing index variability and reside in genes that change their expression in the 11 tissues (expression_max − expression_min > 3). Each row represents one probeset, the x-axis denotes gene expression rank amongst the 11 tissues (sorted independently for each exon) and the splicing index is color coded (red: ≥2.67, blue: ≤−2.67). (D) A more refined picture of individual probeset behavior based on another dataset comprising 57 lymphoblastoid cell line samples in triplicates (14). Eighteen genes are randomly selected from differentially expressed genes (expression_max − expression_min > 4). For each selected gene the behavior of the highest (black) and the lowest (red) responding probeset (based on the slope of a linear regression) is depicted.

**Figure 3.**
Probeset slope predictor based on a positional dinucleotide model. (A) The distributions of log2 probeset-fold changes for the mouse dilution experiment at three different dilutions. Each dilution condition (50%, 25% and 10%) was compared with the 100% mouse RNA array. To eliminate array boundary effects (low and high expression), we only considered a very narrow band of probe expression (between 6 and 8.5 in the 100% mouse RNA experiment). The triangles represent the expression change expected from the experiment. In log2 space this would be −1 for 50%, −2 for 25% and −3.3 for 10% dilution. (B and C) The positional contributions of individual dinucleotides after training a linear model with experimentally measured probe slopes from the dilution experiments. Position 24 represents the end of the probe that is attached to the surface of the microarray for mouse and human, respectively. Red color denotes positive contribution to the slope (overresponding) whereas blue denotes negative contribution (underresponding). The scatterplots in (D and E) show a comparison of the probeset response slopes either predicted by the dinucleotide model (x-axis) or determined by linear regression from the Affymetrix tissue panel data (y-axis) for mouse and human, respectively. Probe response slopes (predicted from the dinucleotide model) were converted to probeset response slopes by averaging the slopes of all the probes that belong to the probeset in question.

**Figure 4.**
Relationship between transcript-fold change and the splicing index in deep sequencing data. (A and B) scatter plots derived from deep sequencing data (9) comparing the log2 gene expression change (x-axis) to the standard deviation of the splicing indices belonging to exons of that particular gene (y-axis).

**Figure 5.**
Cross-species exon array comparison. (A) A scatter plot comparing mouse to human probeset response slopes (determined from the tissue panel data). Mouse probesets were linked to their human homolog counterparts using whole-genome alignments (see Methods section). (B) A comparison of mouse and human exon response slopes based on the predictions from the positional dinucleotide model.

**Figure 6.**
Correcting probeset response behavior. (A) A scatterplot, comparing probeset response slopes estimated by linear regression from a training set of 250 arrays (covering a large variety of tissues) to those from an independent set of 171 lymphoblastoid cell line samples (14). (C and D) The dependency between gene-fold change and splicing index variation (human universal RNA versus human brain RNA), before (C) and after (D) applying a correction based on a linear model using probeset slopes derived from the training set (250 arrays). (B) The number of significant alternative probesets detected in the uncorrected data compared with the corrected data using the linear and the nonlinear model (t-test for every probeset based on 10 replicates, P_cutoff < 10⁻⁵). The bar-groups represent equally sized populations of genes binned according to absolute gene-fold change.

See this image and copyright information in PMC

Cited by

Transcriptional activity regulates alternative cleavage and polyadenylation.
Ji Z, Luo W, Li W, Hoque M, Pan Z, Zhao Y, Tian B. Ji Z, et al. Mol Syst Biol. 2011 Sep 27;7:534. doi: 10.1038/msb.2011.69. Mol Syst Biol. 2011. PMID: 21952137 Free PMC article.
Cytokines interleukin-1beta and tumor necrosis factor-alpha regulate different transcriptional and alternative splicing networks in primary beta-cells.
Ortis F, Naamane N, Flamez D, Ladrière L, Moore F, Cunha DA, Colli ML, Thykjaer T, Thorsen K, Orntoft TF, Eizirik DL. Ortis F, et al. Diabetes. 2010 Feb;59(2):358-74. doi: 10.2337/db09-1159. Epub 2009 Nov 23. Diabetes. 2010. PMID: 19934004 Free PMC article.
Global regulation of alternative splicing by adenosine deaminase acting on RNA (ADAR).
Solomon O, Oren S, Safran M, Deshet-Unger N, Akiva P, Jacob-Hirsch J, Cesarkas K, Kabesa R, Amariglio N, Unger R, Rechavi G, Eyal E. Solomon O, et al. RNA. 2013 May;19(5):591-604. doi: 10.1261/rna.038042.112. Epub 2013 Mar 8. RNA. 2013. PMID: 23474544 Free PMC article.
Different effects of the probe summarization algorithms PLIER and RMA on high-level analysis of Affymetrix exon arrays.
Qu Y, He F, Chen Y. Qu Y, et al. BMC Bioinformatics. 2010 Apr 28;11:211. doi: 10.1186/1471-2105-11-211. BMC Bioinformatics. 2010. PMID: 20426803 Free PMC article.
R and Bioconductor solutions for alternative splicing detection.
Phang T. Phang T. Hum Genomics. 2009 Dec;4(2):131-5. doi: 10.1186/1479-7364-4-2-131. Hum Genomics. 2009. PMID: 20038500 Free PMC article.

See all "Cited by" articles

References

1. Clark TA, Sugnet CW, Ares M. Genomewide analysis of mRNA processing in yeast using splicing-specific microarrays. Science. 2002;296:907–910. - PubMed
1. Cline MS, Blume J, Cawley S, Clark TA, Hu JS, Lu G, Salomonis N, Wang H, Williams A. ANOSVA: a statistical method for detecting splice variation from expression data. Bioinformatics. 2005;21(Suppl. 1):i107–i115. - PubMed
1. Cheung HC, Baggerly KA, Tsavachidis S, Bachinski LL, Neubauer VL, Nixon TJ, Aldape KD, Cote GJ, Krahe R. Global analysis of aberrant pre-mRNA splicing in glioblastoma using exon expression arrays. BMC Genomics. 2008;9:216. - PMC - PubMed
1. Xing Y, Stoilov P, Kapur K, Han A, Jiang H, Shen S, Black DL, Wong WH. MADS: a new and improved method for analysis of differential alternative splicing by exon-tiling microarrays. RNA. 2008;14:1470–1479. - PMC - PubMed
1. Li C, Wong WH. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc. Natl Acad. Sci. USA. 2001;98:31–36. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Overestimation of alternative splicing caused by variable probe characteristics in exon arrays

Affiliation

Overestimation of alternative splicing caused by variable probe characteristics in exon arrays

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Molecular Biology Databases