. 2006 Jan 31:7:49.

doi: 10.1186/1471-2105-7-49.

Effects of filtering by Present call on analysis of microarray experiments

Jeanette N McClintick¹, Howard J Edenberg

Affiliations

PMID: 16448562
PMCID: PMC1409797
DOI: 10.1186/1471-2105-7-49

Effects of filtering by Present call on analysis of microarray experiments

Jeanette N McClintick et al. BMC Bioinformatics. 2006.

. 2006 Jan 31:7:49.

doi: 10.1186/1471-2105-7-49.

Authors

Jeanette N McClintick¹, Howard J Edenberg

Affiliation

¹ Department of Medical and Molecular Genetics, Indiana University, Indianapolis, Indiana, USA. jnmcclin@iupui.edu

PMID: 16448562
PMCID: PMC1409797
DOI: 10.1186/1471-2105-7-49

Abstract

Background: Affymetrix GeneChips are widely used for expression profiling of tens of thousands of genes. The large number of comparisons can lead to false positives. Various methods have been used to reduce false positives, but they have rarely been compared or quantitatively evaluated. Here we describe and evaluate a simple method that uses the detection (Present/Absent) call generated by the Affymetrix microarray suite version 5 software (MAS5) to remove data that is not reliably detected before further analysis, and compare this with filtering by expression level. We explore the effects of various thresholds for removing data in experiments of different size (from 3 to 10 arrays per treatment), as well as their relative power to detect significant differences in expression.

Results: Our approach sets a threshold for the fraction of arrays called Present in at least one treatment group. This method removes a large percentage of probe sets called Absent before carrying out the comparisons, while retaining most of the probe sets called Present. It preferentially retains the more significant probe sets (p < or = 0.001) and those probe sets that are turned on or off, and improves the false discovery rate. Permutations to estimate false positives indicate that probe sets removed by the filter contribute a disproportionate number of false positives. Filtering by fraction Present is effective when applied to data generated either by the MAS5 algorithm or by other probe-level algorithms, for example RMA (robust multichip average). Experiment size greatly affects the ability to reproducibly detect significant differences, and also impacts the effect of filtering; smaller experiments (3-5 samples per treatment group) benefit from more restrictive filtering (> or =50% Present).

Conclusion: Use of a threshold fraction of Present detection calls (derived by MAS5) provided a simple method that effectively eliminated from analysis probe sets that are unlikely to be reliable while preserving the most significant probe sets and those turned on or off; it thereby increased the ratio of true positives to false positives.

PubMed Disclaimer

Figures

**Figure 1**
**Distribution of MAS5 log₂(signals) before and after filtering**. A) No filter. B) Filtering with threshold of ≥ 50% Present in at least one treatment group. C) Filtering by average signal with threshold at ≥475 in at least one treatment group. The number of probe-sets at each value of Log2(signal) are plotted. Black = Present, gray = Marginal, white = Absent.

**Figure 2**
**Distribution of RMA values before and after filtering**. A) No filter. B) Filtering with threshold of ≥ 50% Present in at least one treatment group. C) Filtering by average RMA value with threshold at ≥5.03 in at least one treatment group. Symbols as in Fig. 1.

**Figure 3**
**Percent of probe sets remaining after filtering**. Percent of probe sets remaining after filtering using selected thresholds for A) Fraction Present. B) MAS5 Signal. C) RMA value.

**Figure 4**
**Number of significant probe sets after filtering**. A) Filtering by fraction Present *vs.* by average MAS5 signal. The probe sets called significantly different (at the p-values shown) between the interferon treated and untreated samples in the 10 sample experiment are plotted against the threshold of Fraction Present (FP) or average signal (S), followed by threshold value. The horizontal line at 1230 indicates the number of probe sets at p ≤ 0.001 in the unfiltered data. Paired thresholds remove comparable numbers of probe sets, e.g. FP>0 and S254. B) Filtering by fraction Present vs. by average RMA value. (FP) Fraction Present, (R) average RMA value, followed by threshold value. The line at 1641 indicates the number of probe sets at p ≤ 0.001 in the unfiltered data.

**Figure 5**
**Effect of Filtering on false discovery rate (FDR)**. Filter method and values (x-axis): Fraction Present (FP), signal (S) or RMA value (R) followed by threshold value; separate lines are shown for each. Closed circles represent values from fraction Present filtering, open diamonds from average signal or average RMA. P-values: 0.05 (blue), 0.01 (pink), and 0.001 (green). A) IFN data, MAS5, B) IFN data, RMA, C) Smoking data, MAS5. Note that the smoking data was scaled to 100 instead of 1000 used for the other data sets.

**Figure 6**
**Effect of filtering on average number of significant probe sets in smaller experiments**. Smaller virtual experiments (4, 6 and 8 samples per treatment group) were created by random selection of arrays within each of the two treatment groups (based on 1000 permutations). The probe sets called significantly different (at the p-values shown) are shown for different values of fraction Present (x-axis). Note differences in scale for y-axes of the 3 graphs. P-values: ≤ 0.05 diamond, ≤0.01 square, ≤0.001 triangle.

**Figure 7**
**Effects of filtering on FDR in smaller experiments**. FDR for the smaller virtual experiments shown in Fig. 6. Note differences in scale for y-axes of the 3 graphs. P-values: ≤ 0.05 diamond, ≤0.01 square, ≤0.001 triangle.

**Figure 8**
**Effect of experiment size on true positives, false positives and consistent positives**. TP: true positive, p-value ≤ 0.05 in smaller simulated experiment and p ≤ 0.05 in full 10-sample analysis. FP: false positive, p-value ≤ 0.05 in smaller simulated experiment but p > 0.05 in full 10-sample analysis. 500/1000: consistent positives, found significant at p < -0.05 in at least 50% of the 1000 permutations. Data are shown both unfiltered and after filtering by 50% Present.

**Figure 9**
**Effect of experimental size on number of probe sets meeting a fixed value of FDR before and after filtering**. The number of probe sets meeting various Benjamini and Hochberg FDR thresholds, 0.2 (blue), 0.1 (red), and 0.05 (green) before (open symbols) and after filtering (filled symbols) by 50% Present. Number selected is average over 1000 permutations.

See this image and copyright information in PMC

Cited by

Adaptive filtering of microarray gene expression data based on Gaussian mixture decomposition.
Marczyk M, Jaksik R, Polanski A, Polanska J. Marczyk M, et al. BMC Bioinformatics. 2013 Mar 20;14:101. doi: 10.1186/1471-2105-14-101. BMC Bioinformatics. 2013. PMID: 23510016 Free PMC article.
Filtering, FDR and power.
van Iterson M, Boer JM, Menezes RX. van Iterson M, et al. BMC Bioinformatics. 2010 Sep 7;11:450. doi: 10.1186/1471-2105-11-450. BMC Bioinformatics. 2010. PMID: 20822518 Free PMC article.
A new method for class prediction based on signed-rank algorithms applied to Affymetrix microarray experiments.
Rème T, Hose D, De Vos J, Vassal A, Poulain PO, Pantesco V, Goldschmidt H, Klein B. Rème T, et al. BMC Bioinformatics. 2008 Jan 11;9:16. doi: 10.1186/1471-2105-9-16. BMC Bioinformatics. 2008. PMID: 18190711 Free PMC article.
Transcriptome sequencing of the Microarray Quality Control (MAQC) RNA reference samples using next generation sequencing.
Mane SP, Evans C, Cooper KL, Crasta OR, Folkerts O, Hutchison SK, Harkins TT, Thierry-Mieg D, Thierry-Mieg J, Jensen RV. Mane SP, et al. BMC Genomics. 2009 Jun 12;10:264. doi: 10.1186/1471-2164-10-264. BMC Genomics. 2009. PMID: 19523228 Free PMC article.
Stress-response pathways are altered in the hippocampus of chronic alcoholics.
McClintick JN, Xuei X, Tischfield JA, Goate A, Foroud T, Wetherill L, Ehringer MA, Edenberg HJ. McClintick JN, et al. Alcohol. 2013 Nov;47(7):505-15. doi: 10.1016/j.alcohol.2013.07.002. Epub 2013 Aug 24. Alcohol. 2013. PMID: 23981442 Free PMC article.

See all "Cited by" articles

References

1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc B. 1995;57:289–300.
1. Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci U S A. 2003;100:9440–9445. doi: 10.1073/pnas.1530509100. - DOI - PMC - PubMed
1. Jongeneel CV, Iseli C, Stevenson BJ, Riggins GJ, Lal A, Mackay A, Harris RA, O'Hare MJ, Neville AM, Simpson AJ, Strausberg RL. Comprehensive sampling of gene expression in human cell lines with massively parallel signature sequencing. Proc Natl Acad Sci U S A. 2003;100:4702–4705. doi: 10.1073/pnas.0831040100. - DOI - PMC - PubMed
1. Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H, Brown EL. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol. 1996;14:1675–1680. doi: 10.1038/nbt1296-1675. - DOI - PubMed
1. Liu WM, Mei R, Di X, Ryder TB, Hubbell E, Dee S, Webster TA, Harrington CA, Ho MH, Baid J, Smeekens SP. Analysis of high density expression microarrays with signed-rank call algorithms. Bioinformatics. 2002;18:1593–1599. doi: 10.1093/bioinformatics/18.12.1593. - DOI - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Effects of filtering by Present call on analysis of microarray experiments

Affiliation

Effects of filtering by Present call on analysis of microarray experiments

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous