Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2007;8(4):R54.
doi: 10.1186/gb-2007-8-4-r54.

Statistical tools for synthesizing lists of differentially expressed features in related experiments

Affiliations
Comparative Study

Statistical tools for synthesizing lists of differentially expressed features in related experiments

Marta Blangiardo et al. Genome Biol. 2007.

Abstract

We propose a novel approach for finding a list of features that are commonly perturbed in two or more experiments, quantifying the evidence of dependence between the experiments by a ratio. We present a Bayesian analysis of this ratio, which leads us to suggest two rules for choosing a cut-off on the ranked list of p values. We evaluate and compare the performance of these statistical tools in a simulation study, and show their usefulness on two real datasets.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Typical plots of T(q) and R(q) for associated experiments (case A1). The two associated experiments were simulated under scenario I, structure A, with true differences drawn from a Ga(2.5,0.4) and noise experiment specific of 0.5 and 0.8, respectively (signal-to-noise ratio = 9.6). The left plot shows the distribution of T(q) and the right one shows the distribution of R(q) with Bayesian credibility intervals at 95%. T(q) shows a deviation from 1 for a p value between 0.01 and 0.5. T(qmax) is 2.6 and corresponds to a threshold q = 0.01. R(q) presents the same trend, but the estimates are slightly smaller since the model takes into account the variability of the margins of the 2 × 2 table. The threshold associated with R(q) = 2 is 0.08. The number of genes in common for each ratio R(q) is reported on the right axis of each plot.
Figure 2
Figure 2
Misclassification error, false discovery and false non-discovery rates for case A2 (results are averaged over 50 replicates). The upper plot shows the false discovery rate (FDR) and the false non-discovery rate (FNR) for case A2. The FDR is calculated as the ratio of the false positives to the number of genes called in common, while the FDR is calculated as the ratio of the false negatives to the number of genes not called in common. The true differences dg are drawn from a Ga(2, 0.5) and the noise component experiment specific is 2 for the first experiment and 3 for the second. R(qmax) shows the minimum FDR. On the other hand, R(qmin) has a very large FDR and the improvement of the FNR is slight. As a compromise, the threshold q2 is close to qmax, so guarantees a low FDR, but returns a larger list. It approximatively corresponds to the intersection point between the two curves of FDR and FNR. The lower plot shows the global error as the sum of FP and FN. The threshold associated with R(q2) is very close to the minimum of the curve, that is, to the smallest global misclassification error.
Figure 3
Figure 3
Typical plots of T(q) and R(q) in the case of independent experiments. The two independent experiments are simulated under scenario I, structure A, with true differences drawn from a Ga(1, 1) and noise experiment specific of 2 and 2.5, respectively (signal-to-noise ratio = 0.4). The left plot shows the distribution of T(q) and the right one shows the distribution of R(q) with Bayesian credibility intervals at 95%. T(q) follows a horizontal line of height 1 (independence between the lists) and presents instability for small p values (left tail). The Bayesian model does not present any significant threshold for which R(q) deviates from 1 and the CI95 always includes 1.
Figure 4
Figure 4
Log fold change (natural log) for the VILI experiment (left) and high-fat diet experiment (right). The left plot shows the log fold changes for mice versus rat averaged over the two replicates for each species. The right plot shows the log fold changes for fat versus muscle averaged over the three and four replicates for each species. The circles correspond to the genes highlighted by our analysis and by the method of Hwang et al.; they are characterized by a large log fold change for both the species. The correlation of the two fold changes for this group is 0.4 (VILI experiment) and 0.8 (high-fat diet experiment). The crosses correspond to the genes highlighted only by Hwang et al.'s analysis; they are characterized by a large log fold change for one species and a small fold change for the other one. The correlation of the two fold changes for this group is 0.06 (VILI experiment) and 0.36 (high-fat diet experiment).
Figure 5
Figure 5
Results from the high-fat diet experiment. The left plot shows the distribution of T(q) and the center one shows the distribution of R(q) with Bayesian credibility intervals at 95%. qmax for the conditional model is 0.01 and returns 20 genes in the common list, whilst for the joint model it is 0.02 and returns 49 common genes. On the other hand, q2 = 0.07 and the number of genes in common is 226. The left plot is a blow-up of the Bayesian model results, to better visualize the trend for p values between 0 and 0.2. The number of genes in common for each ratio is reported on the right axis of each plot.

References

    1. Rhodes DR, Barrette TR, Rubin MA, Ghosh D, Chinnaiyan AM. Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Res. 2002;62:4427–4433. - PubMed
    1. Hwang D, Rust AG, Ramsey S, Smith JJ, Leslie DM, Weston AD, deAtauri P, Aitchison JD, Hood L, Siegel AF, Bolouri H. A data integration methodology for systems biology. Proc Natl Acad Sci USA. 2005;102:17296–17301. doi: 10.1073/pnas.0508647102. - DOI - PMC - PubMed
    1. Allison DB, Cui X, Page GP, Sabripour M. Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet. 2006;7:55–65. doi: 10.1038/nrg1749. - DOI - PubMed
    1. Stone RA. Investigations of excess environmental risks around putative sources: statistical problems and a proposed test. Stat Med. 1988;7:649–660. doi: 10.1002/sim.4780070604. - DOI - PubMed
    1. Kulldorff M, Feuer EJ, Miller BA, Freedman LS. Breast cancer in northeastern United States: a geographical analysis. Am J Epidemiol. 1997;146:161–170. - PubMed

Publication types