Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Aug 25:7:391.
doi: 10.1186/1471-2105-7-391.

Probe-level linear model fitting and mixture modeling results in high accuracy detection of differential gene expression

Affiliations

Probe-level linear model fitting and mixture modeling results in high accuracy detection of differential gene expression

Sébastien Lemieux. BMC Bioinformatics. .

Abstract

Background: The identification of differentially expressed genes (DEGs) from Affymetrix GeneChips arrays is currently done by first computing expression levels from the low-level probe intensities, then deriving significance by comparing these expression levels between conditions. The proposed PL-LM (Probe-Level Linear Model) method implements a linear model applied on the probe-level data to directly estimate the treatment effect. A finite mixture of Gaussian components is then used to identify DEGs using the coefficients estimated by the linear model. This approach can readily be applied to experimental design with or without replication.

Results: On a wholly defined dataset, the PL-LM method was able to identify 75% of the differentially expressed genes within 10% of false positives. This accuracy was achieved both using the three replicates per conditions available in the dataset and using only one replicate per condition.

Conclusion: The method achieves, on this dataset, a higher accuracy than the best set of tools identified by the authors of the dataset, and does so using only one replicate per condition.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Examples of probe-level data. a) For a differentially expressed gene (cRNA spiked at 4-fold from the Choe et al. dataset, probe-set: 147419_at), b) a non-differentially expressed gene (cRNA at equal concentrations, probe-set: 149358_at), and c) a non-expressed gene (not in the pool of amplified cRNAs, probe-set: 142266_at). Quantile-normalized data from the control (circle) and spiked (+) samples are shown, including replicate data. Probes are ordered by the average intensity on the control replicates. For each probe-set, the value of T obtained from the linear model is shown.
Figure 2
Figure 2
Mixture modeling of the PL-LM method on the Choe et al. dataset. The three mixture components are represented as ellipse identifying their center and variances. Data points with a conditional probability of belonging to the component modeling DEGs above 0.5 are shown as larger gray dots. The mixing proportions of the three components are 17%, 74% and 9%, respectively for the non-spiked probe-sets, not amplified cRNAs and DEGs.
Figure 3
Figure 3
ROC curves comparing methods on the Choe et al. dataset. Probe-sets with a spiked to control concentration ratio above or equal to 1.2 were considered as DEGs to compute these curves, resulting in 1,326 spiked probe-sets to identify among a total of 14,010. The fold-change applied after RMA summaries and quantile normalization is shown as a baseline for comparison since it is a simple and frequently used method. Nine shaded curves, representing the results from applying the PL-LM method to all combinations of two arrays (one control vs. one spike), are falling directly under the PL-LM (on all replicates) curve (black line). For all methods, the area under the curve (AUC) is reported as a quantitative measure of both sensitivity and specificity.
Figure 4
Figure 4
Distribution of the Cyber-T statistic as a function of average intensity I. Spiked probe-sets are shown as large and dark gray dots, not amplified cRNAs as large and light gray dots, and probe-sets that where not spiked as small black dots.

Similar articles

Cited by

References

    1. Dudoit S, Yang YH, Callow MJ, Speed TP. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat Sin. 2002;12:111–139.
    1. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate - a Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B-Methodological. 1995;57:289–300.
    1. Simon RM, Dobbin K. Experimental design of DNA microarray experiments. Biotechniques. 2003;Suppl:16–21. - PubMed
    1. Affymetrix Statistical Algorithms Description Document http://www.affymetrix.com/support/technical/byproduct.affx?product=mas
    1. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4:249–264. doi: 10.1093/biostatistics/4.2.249. - DOI - PubMed

Publication types

LinkOut - more resources