. 2006 Aug 25:7:391.

doi: 10.1186/1471-2105-7-391.

Probe-level linear model fitting and mixture modeling results in high accuracy detection of differential gene expression

Sébastien Lemieux¹

Affiliations

PMID: 16934150
PMCID: PMC1579233
DOI: 10.1186/1471-2105-7-391

Probe-level linear model fitting and mixture modeling results in high accuracy detection of differential gene expression

Sébastien Lemieux. BMC Bioinformatics. 2006.

. 2006 Aug 25:7:391.

doi: 10.1186/1471-2105-7-391.

Author

Sébastien Lemieux¹

Affiliation

¹ Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Canada. s.lemieux@umontreal.ca

PMID: 16934150
PMCID: PMC1579233
DOI: 10.1186/1471-2105-7-391

Abstract

Background: The identification of differentially expressed genes (DEGs) from Affymetrix GeneChips arrays is currently done by first computing expression levels from the low-level probe intensities, then deriving significance by comparing these expression levels between conditions. The proposed PL-LM (Probe-Level Linear Model) method implements a linear model applied on the probe-level data to directly estimate the treatment effect. A finite mixture of Gaussian components is then used to identify DEGs using the coefficients estimated by the linear model. This approach can readily be applied to experimental design with or without replication.

Results: On a wholly defined dataset, the PL-LM method was able to identify 75% of the differentially expressed genes within 10% of false positives. This accuracy was achieved both using the three replicates per conditions available in the dataset and using only one replicate per condition.

Conclusion: The method achieves, on this dataset, a higher accuracy than the best set of tools identified by the authors of the dataset, and does so using only one replicate per condition.

PubMed Disclaimer

Figures

**Figure 1**
Examples of probe-level data. a) For a differentially expressed gene (cRNA spiked at 4-fold from the Choe *et al*. dataset, probe-set: 147419_at), b) a non-differentially expressed gene (cRNA at equal concentrations, probe-set: 149358_at), and c) a non-expressed gene (not in the pool of amplified cRNAs, probe-set: 142266_at). Quantile-normalized data from the control (circle) and spiked (+) samples are shown, including replicate data. Probes are ordered by the average intensity on the control replicates. For each probe-set, the value of T obtained from the linear model is shown.

**Figure 2**
Mixture modeling of the PL-LM method on the Choe *et al*. dataset. The three mixture components are represented as ellipse identifying their center and variances. Data points with a conditional probability of belonging to the component modeling DEGs above 0.5 are shown as larger gray dots. The mixing proportions of the three components are 17%, 74% and 9%, respectively for the non-spiked probe-sets, not amplified cRNAs and DEGs.

**Figure 3**
ROC curves comparing methods on the Choe *et al*. dataset. Probe-sets with a spiked to control concentration ratio above or equal to 1.2 were considered as DEGs to compute these curves, resulting in 1,326 spiked probe-sets to identify among a total of 14,010. The fold-change applied after RMA summaries and quantile normalization is shown as a baseline for comparison since it is a simple and frequently used method. Nine shaded curves, representing the results from applying the PL-LM method to all combinations of two arrays (one control vs. one spike), are falling directly under the PL-LM (on all replicates) curve (black line). For all methods, the area under the curve (AUC) is reported as a quantitative measure of both sensitivity and specificity.

**Figure 4**
Distribution of the Cyber-T statistic as a function of average intensity I. Spiked probe-sets are shown as large and dark gray dots, not amplified cRNAs as large and light gray dots, and probe-sets that where not spiked as small black dots.

See this image and copyright information in PMC

Cited by

hSETD1A regulates Wnt target genes and controls tumor growth of colorectal cancer cells.
Salz T, Li G, Kaye F, Zhou L, Qiu Y, Huang S. Salz T, et al. Cancer Res. 2014 Feb 1;74(3):775-86. doi: 10.1158/0008-5472.CAN-13-1400. Epub 2013 Nov 18. Cancer Res. 2014. PMID: 24247718 Free PMC article.
A comprehensive re-analysis of the Golden Spike data: towards a benchmark for differential expression methods.
Pearson RD. Pearson RD. BMC Bioinformatics. 2008 Mar 26;9:164. doi: 10.1186/1471-2105-9-164. BMC Bioinformatics. 2008. PMID: 18366762 Free PMC article.
A comparison of probe-level and probeset models for small-sample gene expression data.
Stevens JR, Bell JL, Aston KI, White KL. Stevens JR, et al. BMC Bioinformatics. 2010 May 26;11:281. doi: 10.1186/1471-2105-11-281. BMC Bioinformatics. 2010. PMID: 20504334 Free PMC article.
t-Test at the Probe Level: An Alternative Method to Identify Statistically Significant Genes for Microarray Data.
Boareto M, Caticha N. Boareto M, et al. Microarrays (Basel). 2014 Dec 16;3(4):340-51. doi: 10.3390/microarrays3040340. Microarrays (Basel). 2014. PMID: 27600352 Free PMC article.
Transcriptional Perturbations in Graft Rejection.
Vitalone MJ, Sigdel TK, Salomonis N, Sarwal RD, Hsieh SC, Sarwal MM. Vitalone MJ, et al. Transplantation. 2015 Sep;99(9):1882-93. doi: 10.1097/TP.0000000000000809. Transplantation. 2015. PMID: 26154388 Free PMC article.

See all "Cited by" articles

References

1. Dudoit S, Yang YH, Callow MJ, Speed TP. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat Sin. 2002;12:111–139.
1. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate - a Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B-Methodological. 1995;57:289–300.
1. Simon RM, Dobbin K. Experimental design of DNA microarray experiments. Biotechniques. 2003;Suppl:16–21. - PubMed
1. Affymetrix Statistical Algorithms Description Document http://www.affymetrix.com/support/technical/byproduct.affx?product=mas
1. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4:249–264. doi: 10.1093/biostatistics/4.2.249. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Probe-level linear model fitting and mixture modeling results in high accuracy detection of differential gene expression

Affiliation

Probe-level linear model fitting and mixture modeling results in high accuracy detection of differential gene expression

Author

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources