Interactively optimizing signal-to-noise ratios in expression profiling: project-specific algorithm selection and detection p-value weighting in Affymetrix microarrays
- PMID: 15117752
- DOI: 10.1093/bioinformatics/bth280
Interactively optimizing signal-to-noise ratios in expression profiling: project-specific algorithm selection and detection p-value weighting in Affymetrix microarrays
Abstract
Motivation: The most commonly utilized microarrays for mRNA profiling (Affymetrix) include 'probe sets' of a series of perfect match and mismatch probes (typically 22 oligonucleotides per probe set). There are an increasing number of reported 'probe set algorithms' that differ in their interpretation of a probe set to derive a single normalized 'signal' representative of expression of each mRNA. These algorithms are known to differ in accuracy and sensitivity, and optimization has been done using a small set of standardized control microarray data. We hypothesized that different mRNA profiling projects have varying sources and degrees of confounding noise, and that these should alter the choice of a specific probe set algorithm. Also, we hypothesized that use of the Microarray Suite (MAS) 5.0 probe set detection p-value as a weighting function would improve the performance of all probe set algorithms.
Results: We built an interactive visual analysis software tool (HCE2W) to test and define parameters in Affymetrix analyses that optimize the ratio of signal (desired biological variable) versus noise (confounding uncontrolled variables). Five probe set algorithms were studied with and without statistical weighting of probe sets using the MAS 5.0 probe set detection p-values. The signal-to-noise ratio optimization method was tested in two large novel microarray datasets with different levels of confounding noise, a 105 sample U133A human muscle biopsy dataset (11 groups: mutation-defined, extensive noise), and a 40 sample U74A inbred mouse lung dataset (8 groups: little noise). Performance was measured by the ability of the specific probe set algorithm, with and without detection p-value weighting, to cluster samples into the appropriate biological groups (unsupervised agglomerative clustering with F-measure values). Of the total random sampling analyses, 50% showed a highly statistically significant difference between probe set algorithms by ANOVA [F(4,10) > 14, p < 0.0001], with weighting by MAS 5.0 detection p-value showing significance in the mouse data by ANOVA [F(1,10) > 9, p < 0.013] and paired t-test [t(9) = -3.675, p = 0.005]. Probe set detection p-value weighting had the greatest positive effect on performance of dChip difference model, ProbeProfiler and RMA algorithms. Importantly, probe set algorithms did indeed perform differently depending on the specific project, most probably due to the degree of confounding noise. Our data indicate that significantly improved data analysis of mRNA profile projects can be achieved by optimizing the choice of probe set algorithm with the noise levels intrinsic to a project, with dChip difference model with MAS 5.0 detection p-value continuous weighting showing the best overall performance in both projects. Furthermore, both existing and newly developed probe set algorithms should incorporate a detection p-value weighting to improve performance.
Availability: The Hierarchical Clustering Explorer 2.0 is available at http://www.cs.umd.edu/hcil/hce/ Murine arrays (40 samples) are publicly available at the PEPR resource (http://microarray.cnmcresearch.org/pgadatatable.asp http://pepr.cnmcresearch.org Chen et al., 2004).
Similar articles
-
An interactive power analysis tool for microarray hypothesis testing and generation.Bioinformatics. 2006 Apr 1;22(7):808-14. doi: 10.1093/bioinformatics/btk052. Epub 2006 Jan 17. Bioinformatics. 2006. PMID: 16418236
-
Statistical analysis of high-density oligonucleotide arrays: a multiplicative noise model.Bioinformatics. 2002 Dec;18(12):1633-40. doi: 10.1093/bioinformatics/18.12.1633. Bioinformatics. 2002. PMID: 12490448
-
Probe set algorithms: is there a rational best bet?BMC Bioinformatics. 2006 Aug 30;7:395. doi: 10.1186/1471-2105-7-395. BMC Bioinformatics. 2006. PMID: 16942624 Free PMC article.
-
Algorithms for high-density oligonucleotide array.Curr Opin Drug Discov Devel. 2003 May;6(3):339-45. Curr Opin Drug Discov Devel. 2003. PMID: 12833666 Review.
-
Detection call algorithms for high-throughput gene expression microarray data.Brief Bioinform. 2010 Mar;11(2):244-52. doi: 10.1093/bib/bbp055. Epub 2009 Nov 25. Brief Bioinform. 2010. PMID: 19939941 Free PMC article. Review.
Cited by
-
An improved distance measure between the expression profiles linking co-expression and co-regulation in mouse.BMC Bioinformatics. 2006 Jan 26;7:44. doi: 10.1186/1471-2105-7-44. BMC Bioinformatics. 2006. PMID: 16438730 Free PMC article.
-
Investigation of gene expression in C(2)C(12) myotubes following simvastatin application and mechanical strain.J Atheroscler Thromb. 2009 Mar;16(1):21-9. doi: 10.5551/jat.e551. Epub 2009 Mar 5. J Atheroscler Thromb. 2009. PMID: 19262002 Free PMC article.
-
Delineation of a gene network underlying the pulmonary response to oxidative stress in asthma.J Investig Med. 2009 Oct;57(7):756-64. doi: 10.2310/JIM.0b013e3181b91a83. J Investig Med. 2009. PMID: 19730131 Free PMC article.
-
Endothelial cell activation and neovascularization are prominent in dermatomyositis.J Autoimmune Dis. 2006 Feb 20;3:2. doi: 10.1186/1740-2557-3-2. J Autoimmune Dis. 2006. PMID: 16504012 Free PMC article.
-
Sexual dimorphism in immune response genes as a function of puberty.BMC Immunol. 2006 Feb 22;7:2. doi: 10.1186/1471-2172-7-2. BMC Immunol. 2006. PMID: 16504066 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources