Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Mar 24:12:84.
doi: 10.1186/1471-2105-12-84.

Meta-analysis of gene expression microarrays with missing replicates

Affiliations

Meta-analysis of gene expression microarrays with missing replicates

Fan Shi et al. BMC Bioinformatics. .

Abstract

Background: Many different microarray experiments are publicly available today. It is natural to ask whether different experiments for the same phenotypic conditions can be combined using meta-analysis, in order to increase the overall sample size. However, some genes are not measured in all experiments, hence they cannot be included or their statistical significance cannot be appropriately estimated in traditional meta-analysis. Nonetheless, these genes, which we refer to as incomplete genes, may also be informative and useful.

Results: We propose a meta-analysis framework, called "Incomplete Gene Meta-analysis", which can include incomplete genes by imputing the significance of missing replicates, and computing a meta-score for every gene across all datasets. We demonstrate that the incomplete genes are worthy of being included and our method is able to appropriately estimate their significance in two groups of experiments. We first apply the Incomplete Gene Meta-analysis and several comparable methods to five breast cancer datasets with an identical set of probes. We simulate incomplete genes by randomly removing a subset of probes from each dataset and demonstrate that our method consistently outperforms two other methods in terms of their false discovery rate. We also apply the methods to three gastric cancer datasets for the purpose of discriminating diffuse and intestinal subtypes.

Conclusions: Meta-analysis is an effective approach that identifies more robust sets of differentially expressed genes from multiple studies. The incomplete genes that mainly arise from the use of different platforms may also have statistical and biological importance but are ignored or are not appropriately involved by previous studies. Our Incomplete Gene Meta-analysis is able to incorporate the incomplete genes by estimating their significance. The results on both breast and gastric cancer datasets suggest that the highly ranked genes and associated GO terms produced by our method are more significant and biologically meaningful according to the previous literature.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overlap between gene sets from different platforms. The overlap between the gene sets from different microarray platforms. Left: Three platforms used in [14]. Right: Three gastric cancer datasets used in our experiments.
Figure 2
Figure 2
Incomplete Gene Meta-analysis. The process of Incomplete Gene Meta-analysis.
Figure 3
Figure 3
FDR evaluation on breast cancer datasets. The average FDR of different meta-analysis methods in the breast cancer datasets. Except for the FDR computed on the original datasets which is used as the gold standard (labeled "Gold"), the other methods were averaged across 100 groups of datasets with simulated missing replicates. The 95% quantiles of the FDR across the 100 simulations are given by the error bars at the number of features 200, 400, 600 and 800.
Figure 4
Figure 4
Precision-recall of GO terms. Precision-recall curves of GO terms in the breast cancer datasets. Left: the true significant terms are annotated from the gold standard under the threshold 0.001. Right: the true significant terms are annotated from the gold standard under the threshold 0.01.
Figure 5
Figure 5
ROC of GO terms. ROC curves of GO terms in the breast cancer datasets. Left: the true significant terms are annotated from the gold standard under the threshold 0.01. Right: the true significant terms are annotated from the gold standard under the threshold 0.1.
Figure 6
Figure 6
Agreement of GO terms. Scatter plot of all GO terms between different methods in the breast cancer datasets. Left: the agreement between the IGM, INTERSECTION, IGNORE methods and the gold standard. Right: the agreement between the IGM, INTERSECTION, IGNORE methods.
Figure 7
Figure 7
FDR evaluation on 11 cancer datasets. The average FDR of different meta-analysis methods in the 11 cancer datasets. The same experimental settings were used as the five breast cancer datasets.

Similar articles

Cited by

References

    1. Warnat P, Eils R, Brors B. Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes. BMC Bioinformatics. 2005;6:265+. doi: 10.1186/1471-2105-6-265. - DOI - PMC - PubMed
    1. Xu L, Geman D, Winslow R. Large-scale integration of cancer microarray data identifies a robust common cancer signature. BMC Bioinformatics. 2007;8:275+. doi: 10.1186/1471-2105-8-275. - DOI - PMC - PubMed
    1. Xu L, Tan AC, Winslow RL, Geman D. Merging microarray data from separate breast cancer studies provides a robust prognostic test. BMC Bioinformatics. 2008;9:125+. doi: 10.1186/1471-2105-9-125. - DOI - PMC - PubMed
    1. Hedges LV, Olkin I. Statistical Methods for Meta-Analysis. Academic Press. San Diego, CA, USA; 1985.
    1. Rhodes DR, Barrette TR, Rubin MA, Ghosh D, Chinnaiyan AM. Meta-Analysis of Microarrays: Interstudy Validation of Gene Expression Profiles Reveals Pathway Dysregulation in Prostate Cancer. Cancer Research. 2002;62(15):4427–4433. - PubMed

Publication types

MeSH terms

LinkOut - more resources