Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Aug 30:7:395.
doi: 10.1186/1471-2105-7-395.

Probe set algorithms: is there a rational best bet?

Affiliations

Probe set algorithms: is there a rational best bet?

Jinwook Seo et al. BMC Bioinformatics. .

Abstract

Affymetrix microarrays have become a standard experimental platform for studies of mRNA expression profiling. Their success is due, in part, to the multiple oligonucleotide features (probes) against each transcript (probe set). This multiple testing allows for more robust background assessments and gene expression measures, and has permitted the development of many computational methods to translate image data into a single normalized "signal" for mRNA transcript abundance. There are now many probe set algorithms that have been developed, with a gradual movement away from chip-by-chip methods (MAS5), to project-based model-fitting methods (dCHIP, RMA, others). Data interpretation is often profoundly changed by choice of algorithm, with disoriented biologists questioning what the "accurate" interpretation of their experiment is. Here, we summarize the debate concerning probe set algorithms. We provide examples of how changes in mismatch weight, normalizations, and construction of expression ratios each dramatically change data interpretation. All interpretations can be considered as computationally appropriate, but with varying biological credibility. We also illustrate the performance of two new hybrid algorithms (PLIER, GC-RMA) relative to more traditional algorithms (dCHIP, MAS5, Probe Profiler PCA, RMA) using an interactive power analysis tool. PLIER appears superior to other algorithms in avoiding false positives with poorly performing probe sets. Based on our interpretation of the literature, and examples presented here, we suggest that the variability in performance of probe set algorithms is more dependent upon assumptions regarding "background", than on calculations of "signal". We argue that "background" is an enormously complex variable that can only be vaguely quantified, and thus the "best" probe set algorithm will vary from project to project.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Effects of probe set algorithms on absolute expression values and variance. Shown is the same 54 microarray data set from a muscle regeneration 27 time point temporal series [25], analyzed by five different probe set algorithms. While all probe set algorithms show the same transcriptional induction of this transcript at the day 3.0 time point (expression pattern), the absolute expression levels both at baseline (time 0), and peak expression (day 3.0) computed by the probe set algorithms vary significantly. For example, at baseline, MAS5, dCHIP Difference model, and PCA all show expression near background levels (0), while dCHIP perfect match and RMA show baseline values at 5,000–10,000 units. All graphs are output of the PEPR public access tool [26].
Figure 2
Figure 2
Effects of probe set algorithms on fold-change calculations. Shown is the same data set in Figure 1, now normalized to time 0, with the Y axis showing fold-change. The calculated fold change from day 0 (baseline) to peak transcript induction at time point 3.0 days varies considerably from algorithm to algorithm. For example, PCA algorithm shows a 90-fold induction compared to baseline, while RMA and dCHIP show only 2.5-fold induction.
Figure 3
Figure 3
Power calculations of absent call-only probe sets shows variable rates of false positives. Shown is the output of HCE-power [20] for five probe set algorithms, including two newer "hybrid" algorithms (PLIER, GC-RMA), expressed as a % of probe sets fulfilling specific criteria (β = 0.2, α = 0.05, effect size = 1.5 fold change). Muscle biopsies from 16 normal controls, and 10 Duchenne muscular dystrophy patients were used on U133A microarrays [27], and the 8,200 probe sets that showed an "absent call" by MAS5.0 algorithm on all 26 arrays were then loaded into HCE-power. These "absent calls" reflect poorly performing probe sets, where there is low confidence that signal specific to the transcript is detected above background levels. By this analysis, both RMA and GC-RMA show significant powering of 70–80% of these "absent call" probe sets with only 2 microarrays per group. This can be interpreted as a high proportion of false positive results expected from this project using RMA or GC-RMA. On the other hand, PLIER shows insufficient powering for 98% of the 8,200 probe sets, even at group sizes of 10 arrays/group. This suggests that PLIER will show no false positives.
Figure 4
Figure 4
Log transformation of absent call signals shows a strong reduction in variance, leading to a greater proportion of sufficiently powered probe sets through greater precision, but less accuracy (higher expected false positives). Log transformation of data is a commonly used method to reduce variance, and thus increase precision. Taking the same data set shown in Figure 3 and log transforming the data effectively increases the proportion of probe sets that are sufficiently powered at low numbers of arrays per group.

Similar articles

Cited by

References

    1. Affymetrix http://www.affymetrix.com/
    1. Irizarry RA, Wu Z, Jaffee HA. Comparison of Affymetrix GeneChip expression measures. Bioinformatics. 22:789–94. doi: 10.1093/bioinformatics/btk046. 2006 Apr 1. - DOI - PubMed
    1. Seo J, Gordish-Dressman H, Hoffman EP. An interactive power analysis tool for microarray hypothesis testing and generation. Bioinformatics. 22:808–14. doi: 10.1093/bioinformatics/btk052. 2006 Apr 1. - DOI - PubMed
    1. Li C, Wong WH. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci USA. 98:31–6. doi: 10.1073/pnas.011404098. 2001 Jan 2. - DOI - PMC - PubMed
    1. Probe Profiler Software http://www.corimbia.com/Pages/ProbeProfiler.htm

Publication types

LinkOut - more resources