Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 Nov 7:9:529.
doi: 10.1186/1471-2164-9-529.

Gene expression and isoform variation analysis using Affymetrix Exon Arrays

Affiliations
Comparative Study

Gene expression and isoform variation analysis using Affymetrix Exon Arrays

Amandine Bemmo et al. BMC Genomics. .

Erratum in

Abstract

Background: Alternative splicing and isoform level expression profiling is an emerging field of interest within genomics. Splicing sensitive microarrays, with probes targeted to individual exons or exon-junctions, are becoming increasingly popular as a tool capable of both expression profiling and finer scale isoform detection. Despite their intuitive appeal, relatively little is known about the performance of such tools, particularly in comparison with more traditional 3' targeted microarrays. Here, we use the well studied Microarray Quality Control (MAQC) dataset to benchmark the Affymetrix Exon Array, and compare it to two other popular platforms: Illumina, and Affymetrix U133.

Results: We show that at the gene expression level, the Exon Array performs comparably with the two 3' targeted platforms. However, the interplatform correlation of the results is slightly lower than between the two 3' arrays. We show that some of the discrepancies stem from the RNA amplification protocols, e.g. the Exon Array is able to detect expression of non-polyadenylated transcripts. More importantly, we show that many other differences result from the ability of the Exon Array to monitor more detailed isoform-level changes; several examples illustrate that changes detected by the 3' platforms are actually isoform variations, and that the nature of these variations can be resolved using Exon Array data. Finally, we show how the Exon Array can be used to detect alternative isoform differences, such as alternative splicing, transcript termination, and alternative promoter usage. We discuss the possible pitfalls and false positives resulting from isoform-level analysis.

Conclusion: The Exon Array is a valuable tool that can be used to profile gene expression while providing important additional information regarding the types of gene isoforms that are expressed and variable. However, analysis of alternative splicing requires much more hands on effort and visualization of results in order to correctly interpret the data, and generally results in considerably higher false positive rates than expression analysis. One of the main sources of error in the MAQC dataset is variation in amplification efficiency across transcripts, most likely caused by joint effects of elevated GC content in the 5' ends of genes and reduced likelihood of random-primed first strand synthesis in the 3' ends of genes. These effects are currently not adequately corrected using existing statistical methods. We outline approaches to reduce such errors by filtering out potentially problematic data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
PCA plots at the probe set level show two main sources of variation among the 20 samples. The first principal component explains 65% of the variance and corresponds, as expected, to the biological source of the sample: brain (B) vs. reference (R). The second principal component explains 20% of the variance and corresponds to the "lab effect" between VT (blue), and McGill (red) – that is, it illustrates the technical variability across labs.
Figure 2
Figure 2
Comparison of log2(FC) detected between the biological samples for the two labs. Despite significant variation in expression measure across test sites, the fold change estimates are highly correlated.
Figure 3
Figure 3
Correlation of fold changes between Affymetrix U133, Illumina, and the Affymetrix Exon Array. Fold changes (log2 transformed) between brain and reference expression levels for 8391 genes common to all three platforms: A) Illumina vs. U133. B) Exon Array vs. U133, C) Exon Array vs. Illumina. The arrow points to the highly discordant detection of 4 histone genes: HIST1H3B, HIST1H1B, HIST1H3C, HIST1H3I.
Figure 4
Figure 4
Exon array analysis of the ELAVL1 gene expression differences between brain and reference tissues. The horizontal scale corresponds to each probeset within the gene from the 3' to 5' ends. The height of the blue bars indicates the log2(fold change) in expression between the samples. The red line indicates statistical significance, -log10(p-value).
Figure 5
Figure 5
Visualization of expression patterns of ELAVL1 gene. The top two custom tracks display the Exon Array information from Figure 4: statistical significance and fold change. Note that the two probeset "blocks" correspond to the two isoforms of the gene. The long 3'UTR isoform is predominantly expressed in the brain, whereas the short isoform is more abundant in the reference tissues.
Figure 6
Figure 6
Examples of Candidates from Splicing Index Analysis. Top panels show the p-values (dotted line) and fold-changes (blue bars) for the expression of individual probesets. The centre panels show the values normalized for overall difference in gene expression (SI). Bottom panels show the raw hybridization levels of each probeset. A) MADD – successful use of the splicing index. In this example, in the presence of an overall 3-fold gene expression difference between the samples, the SI factors out the expression difference and indicates three alternatively spliced probesets – 3329761, 3329771, and 33291783 – all of which have strong supporting RefSeq annotation evidence for alternative splicing. B) TYMS – a typical false positive, where differences in probe response levels close to the edges of the transcript suggest alternative isoform usage. Such results are often erroneous, resulting from non-uniform response of individual probesets to large (in this case ~20 fold) changes in gene expression. Note the elevated signal intensity (bottom panel) at the 5' end of the gene, suggesting saturation, and a reduced intensity at the 3' terminus, possibly to reduced amplification efficiency.
Figure 7
Figure 7
Edge bias. This figure illustrates variation of hybridization intensity across transcripts. For each probeset expressed above background levels, we determined the average hybridization intensity as a function of distance from the 5' and 3' ends of the mRNA molecule. Top panels show the average signal intensity as a function of probeset distance from the 5' and 3' ends of transcripts. A significant decrease in signal strength is seen at the 3' end, while a slight increase occurs at the 5' end. Bottom panels illustrate the ability of the array to detect the hybridization signal above background levels. Mean DABG values decrease at both 5' and 3' extremities of genes. The 3' effect results directly from the reduction in hybridization intensity. The 5' effect is most likely the result of increased GC content of the 5' probes located close to unmethylated gene promoters and CpG islands. Both effects cause false positive results in Splicing Index and Splicing ANOVA analyses in the presence of changes in expression of the whole transcript. Only genes with detectable expression (average DABG p-value < 0.05) and total mRNA length greater than 1000 nucleotides were included in this analysis. The values were calculated as log-averages of core probeset intensity across all samples. Each point on the plot corresponds to all probeset ending within a bin of length 10 bp, at the indicated distance from mRNA termini.
Figure 8
Figure 8
Exon Array average gene expression index as a function of transcript (mRNA) length. There is a highly significant positive correlation of expression and length (R = 0.18, p < 10-20). This effect is most likely an artefact of the edge bias illustrated in Figure 7; short transcripts have a lower overall efficiency of first strand synthesis and appear to be expressed at lower levels. The effect is not observed in the 3' amplified U133 (R = 0.05) and Illumina (R = -0.03) results.

References

    1. Frey BJ, Mohammad N, Morris QD, Zhang W, Robinson MD, Mnaimneh S, Chang R, Pan Q, Sat E, Rossant J, et al. Genome-wide analysis of mouse transcripts using exon microarrays and factor graphs. Nat Genet. 2005;37:991–996. doi: 10.1038/ng1630. - DOI - PubMed
    1. Lee C, Roy M. Analysis of alternative splicing with microarrays: successes and challenges. Genome Biol. 2004;5:231. doi: 10.1186/gb-2004-5-7-231. - DOI - PMC - PubMed
    1. Clark TA, Schweitzer AC, Chen TX, Staples MK, Lu G, Wang H, Williams A, Blume JE. Discovery of tissue-specific exons using comprehensive human exon microarrays. Genome Biol. 2007;8:R64. doi: 10.1186/gb-2007-8-4-r64. - DOI - PMC - PubMed
    1. Gardina PJ, Clark TA, Shimada B, Staples MK, Yang Q, Veitch J, Schweitzer A, Awad T, Sugnet C, Dee S, et al. Alternative splicing and differential gene expression in colon cancer detected by a whole genome exon array. BMC Genomics. 2006;7:325. doi: 10.1186/1471-2164-7-325. - DOI - PMC - PubMed
    1. Hung LH, Heiner M, Hui J, Schreiner S, Benes V, Bindereif A. Diverse roles of hnRNP L in mammalian mRNA processing: a combined microarray and RNAi analysis. RNA. 2008;14:284–296. doi: 10.1261/rna.725208. - DOI - PMC - PubMed

Publication types

Substances