Comparative Study

. 2007;8(4):R54.

doi: 10.1186/gb-2007-8-4-r54.

Statistical tools for synthesizing lists of differentially expressed features in related experiments

Marta Blangiardo¹, Sylvia Richardson

Affiliations

PMID: 17428330
PMCID: PMC1896017
DOI: 10.1186/gb-2007-8-4-r54

Comparative Study

Statistical tools for synthesizing lists of differentially expressed features in related experiments

Marta Blangiardo et al. Genome Biol. 2007.

. 2007;8(4):R54.

doi: 10.1186/gb-2007-8-4-r54.

Authors

Marta Blangiardo¹, Sylvia Richardson

Affiliation

¹ Centre for Biostatistics, Imperial College, St Mary's Campus, Norfolk Place, London, UK. m.blangiardo@imperial.ac.uk

PMID: 17428330
PMCID: PMC1896017
DOI: 10.1186/gb-2007-8-4-r54

Abstract

We propose a novel approach for finding a list of features that are commonly perturbed in two or more experiments, quantifying the evidence of dependence between the experiments by a ratio. We present a Bayesian analysis of this ratio, which leads us to suggest two rules for choosing a cut-off on the ranked list of p values. We evaluate and compare the performance of these statistical tools in a simulation study, and show their usefulness on two real datasets.

PubMed Disclaimer

Figures

**Figure 1**
Typical plots of T(q) and R(q) for associated experiments (case A1). The two associated experiments were simulated under scenario I, structure A, with true differences drawn from a Ga(2.5,0.4) and noise experiment specific of 0.5 and 0.8, respectively (signal-to-noise ratio = 9.6). The left plot shows the distribution of T(q) and the right one shows the distribution of R(q) with Bayesian credibility intervals at 95%. T(q) shows a deviation from 1 for a p value between 0.01 and 0.5. T(q_max) is 2.6 and corresponds to a threshold q = 0.01. R(q) presents the same trend, but the estimates are slightly smaller since the model takes into account the variability of the margins of the 2 × 2 table. The threshold associated with R(q) = 2 is 0.08. The number of genes in common for each ratio R(q) is reported on the right axis of each plot.

**Figure 2**
Misclassification error, false discovery and false non-discovery rates for case A2 (results are averaged over 50 replicates). The upper plot shows the false discovery rate (FDR) and the false non-discovery rate (FNR) for case A2. The FDR is calculated as the ratio of the false positives to the number of genes called in common, while the FDR is calculated as the ratio of the false negatives to the number of genes not called in common. The true differences d_gare drawn from a Ga(2, 0.5) and the noise component experiment specific is 2 for the first experiment and 3 for the second. R(q_max) shows the minimum FDR. On the other hand, R(q_min) has a very large FDR and the improvement of the FNR is slight. As a compromise, the threshold q₂is close to q_max, so guarantees a low FDR, but returns a larger list. It approximatively corresponds to the intersection point between the two curves of FDR and FNR. The lower plot shows the global error as the sum of FP and FN. The threshold associated with R(q₂) is very close to the minimum of the curve, that is, to the smallest global misclassification error.

**Figure 3**
Typical plots of T(q) and R(q) in the case of independent experiments. The two independent experiments are simulated under scenario I, structure A, with true differences drawn from a Ga(1, 1) and noise experiment specific of 2 and 2.5, respectively (signal-to-noise ratio = 0.4). The left plot shows the distribution of T(q) and the right one shows the distribution of R(q) with Bayesian credibility intervals at 95%. T(q) follows a horizontal line of height 1 (independence between the lists) and presents instability for small p values (left tail). The Bayesian model does not present any significant threshold for which R(q) deviates from 1 and the CI₉₅always includes 1.

**Figure 4**
Log fold change (natural log) for the VILI experiment (left) and high-fat diet experiment (right). The left plot shows the log fold changes for mice versus rat averaged over the two replicates for each species. The right plot shows the log fold changes for fat versus muscle averaged over the three and four replicates for each species. The circles correspond to the genes highlighted by our analysis and by the method of Hwang *et al*.; they are characterized by a large log fold change for both the species. The correlation of the two fold changes for this group is 0.4 (VILI experiment) and 0.8 (high-fat diet experiment). The crosses correspond to the genes highlighted only by Hwang *et al*.'s analysis; they are characterized by a large log fold change for one species and a small fold change for the other one. The correlation of the two fold changes for this group is 0.06 (VILI experiment) and 0.36 (high-fat diet experiment).

**Figure 5**
Results from the high-fat diet experiment. The left plot shows the distribution of T(q) and the center one shows the distribution of R(q) with Bayesian credibility intervals at 95%. q_maxfor the conditional model is 0.01 and returns 20 genes in the common list, whilst for the joint model it is 0.02 and returns 49 common genes. On the other hand, q₂= 0.07 and the number of genes in common is 226. The left plot is a blow-up of the Bayesian model results, to better visualize the trend for p values between 0 and 0.2. The number of genes in common for each ratio is reported on the right axis of each plot.

See this image and copyright information in PMC

References

1. Rhodes DR, Barrette TR, Rubin MA, Ghosh D, Chinnaiyan AM. Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Res. 2002;62:4427–4433. - PubMed
1. Hwang D, Rust AG, Ramsey S, Smith JJ, Leslie DM, Weston AD, deAtauri P, Aitchison JD, Hood L, Siegel AF, Bolouri H. A data integration methodology for systems biology. Proc Natl Acad Sci USA. 2005;102:17296–17301. doi: 10.1073/pnas.0508647102. - DOI - PMC - PubMed
1. Allison DB, Cui X, Page GP, Sabripour M. Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet. 2006;7:55–65. doi: 10.1038/nrg1749. - DOI - PubMed
1. Stone RA. Investigations of excess environmental risks around putative sources: statistical problems and a proposed test. Stat Med. 1988;7:649–660. doi: 10.1002/sim.4780070604. - DOI - PubMed
1. Kulldorff M, Feuer EJ, Miller BA, Freedman LS. Breast cancer in northeastern United States: a geographical analysis. Am J Epidemiol. 1997;146:161–170. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

Wellcome Trust/United Kingdom

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Statistical tools for synthesizing lists of differentially expressed features in related experiments

Affiliation

Statistical tools for synthesizing lists of differentially expressed features in related experiments

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources