Probe set algorithms: is there a rational best bet?

Jinwook Seo¹, Eric P Hoffman

Affiliations

PMID: 16942624
PMCID: PMC1569879
DOI: 10.1186/1471-2105-7-395

Probe set algorithms: is there a rational best bet?

Jinwook Seo et al. BMC Bioinformatics. 2006.

. 2006 Aug 30:7:395.

doi: 10.1186/1471-2105-7-395.

Authors

Jinwook Seo¹, Eric P Hoffman

Affiliation

¹ Research Center for Genetic Medicine, Children's National Medical Center, 111 Michigan Ave NW, Washington DC 20010, USA. jseo@cnmcresearch.org

PMID: 16942624
PMCID: PMC1569879
DOI: 10.1186/1471-2105-7-395

Abstract

Affymetrix microarrays have become a standard experimental platform for studies of mRNA expression profiling. Their success is due, in part, to the multiple oligonucleotide features (probes) against each transcript (probe set). This multiple testing allows for more robust background assessments and gene expression measures, and has permitted the development of many computational methods to translate image data into a single normalized "signal" for mRNA transcript abundance. There are now many probe set algorithms that have been developed, with a gradual movement away from chip-by-chip methods (MAS5), to project-based model-fitting methods (dCHIP, RMA, others). Data interpretation is often profoundly changed by choice of algorithm, with disoriented biologists questioning what the "accurate" interpretation of their experiment is. Here, we summarize the debate concerning probe set algorithms. We provide examples of how changes in mismatch weight, normalizations, and construction of expression ratios each dramatically change data interpretation. All interpretations can be considered as computationally appropriate, but with varying biological credibility. We also illustrate the performance of two new hybrid algorithms (PLIER, GC-RMA) relative to more traditional algorithms (dCHIP, MAS5, Probe Profiler PCA, RMA) using an interactive power analysis tool. PLIER appears superior to other algorithms in avoiding false positives with poorly performing probe sets. Based on our interpretation of the literature, and examples presented here, we suggest that the variability in performance of probe set algorithms is more dependent upon assumptions regarding "background", than on calculations of "signal". We argue that "background" is an enormously complex variable that can only be vaguely quantified, and thus the "best" probe set algorithm will vary from project to project.

PubMed Disclaimer

Figures

**Figure 1**
**Effects of probe set algorithms on absolute expression values and variance**. Shown is the same 54 microarray data set from a muscle regeneration 27 time point temporal series [25], analyzed by five different probe set algorithms. While all probe set algorithms show the same transcriptional induction of this transcript at the day 3.0 time point (expression pattern), the absolute expression levels both at baseline (time 0), and peak expression (day 3.0) computed by the probe set algorithms vary significantly. For example, at baseline, MAS5, dCHIP Difference model, and PCA all show expression near background levels (0), while dCHIP perfect match and RMA show baseline values at 5,000–10,000 units. All graphs are output of the PEPR public access tool [26].

**Figure 2**
**Effects of probe set algorithms on fold-change calculations**. Shown is the same data set in Figure 1, now normalized to time 0, with the Y axis showing fold-change. The calculated fold change from day 0 (baseline) to peak transcript induction at time point 3.0 days varies considerably from algorithm to algorithm. For example, PCA algorithm shows a 90-fold induction compared to baseline, while RMA and dCHIP show only 2.5-fold induction.

**Figure 3**
**Power calculations of absent call-only probe sets shows variable rates of false positives**. Shown is the output of HCE-power [20] for five probe set algorithms, including two newer "hybrid" algorithms (PLIER, GC-RMA), expressed as a % of probe sets fulfilling specific criteria (β = 0.2, α = 0.05, effect size = 1.5 fold change). Muscle biopsies from 16 normal controls, and 10 Duchenne muscular dystrophy patients were used on U133A microarrays [27], and the 8,200 probe sets that showed an "absent call" by MAS5.0 algorithm on all 26 arrays were then loaded into HCE-power. These "absent calls" reflect poorly performing probe sets, where there is low confidence that signal specific to the transcript is detected above background levels. By this analysis, both RMA and GC-RMA show significant powering of 70–80% of these "absent call" probe sets with only 2 microarrays per group. This can be interpreted as a high proportion of false positive results expected from this project using RMA or GC-RMA. On the other hand, PLIER shows insufficient powering for 98% of the 8,200 probe sets, even at group sizes of 10 arrays/group. This suggests that PLIER will show no false positives.

**Figure 4**
Log transformation of absent call signals shows a strong reduction in variance, leading to a greater proportion of sufficiently powered probe sets through greater precision, but less accuracy (higher expected false positives). Log transformation of data is a commonly used method to reduce variance, and thus increase precision. Taking the same data set shown in Figure 3 and log transforming the data effectively increases the proportion of probe sets that are sufficiently powered at low numbers of arrays per group.

See this image and copyright information in PMC

References

1. Affymetrix http://www.affymetrix.com/
1. Irizarry RA, Wu Z, Jaffee HA. Comparison of Affymetrix GeneChip expression measures. Bioinformatics. 22:789–94. doi: 10.1093/bioinformatics/btk046. 2006 Apr 1. - DOI - PubMed
1. Seo J, Gordish-Dressman H, Hoffman EP. An interactive power analysis tool for microarray hypothesis testing and generation. Bioinformatics. 22:808–14. doi: 10.1093/bioinformatics/btk052. 2006 Apr 1. - DOI - PubMed
1. Li C, Wong WH. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci USA. 98:31–6. doi: 10.1073/pnas.011404098. 2001 Jan 2. - DOI - PMC - PubMed
1. Probe Profiler Software http://www.corimbia.com/Pages/ProbeProfiler.htm

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Probe set algorithms: is there a rational best bet?

Affiliation

Probe set algorithms: is there a rational best bet?

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous