Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008:2008:5660-3.
doi: 10.1109/IEMBS.2008.4650498.

Combining multiple microarray studies using bootstrap meta-analysis

Affiliations

Combining multiple microarray studies using bootstrap meta-analysis

Andrea B Barrett et al. Annu Int Conf IEEE Eng Med Biol Soc. 2008.

Abstract

Microarray technology has enabled us to simultaneously measure the expression of thousands of genes. Using this high-throughput data collection, we can examine subtle genetic changes between biological samples and build predictive models for clinical applications. Although microarrays have dramatically increased the rate of data collection, sample size is still a major issue in feature selection. Previous methods show that microarray data combination is successful in improving selection when using z-scores and fold change. We propose a wrapper based gene selection technique that combines bootstrap estimated classification errors for individual genes across multiple datasets. The bootstrap is an unbiased estimator of classification error and has been shown to be effective for small sample data. Coupled with data combination across multiple data sets, we show that this meta-analytic approach improves gene selection.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distributions of normalized classification errors for the renal cancer datasets. Solid lines represent individual or combined distributions while dashed lines represent null distributions. Both the Schuetz (top left) and Jones (top right) datasets deviate significantly from the null distributions, indicating a large number of differentially expressed genes. This deviation is reflected in the combined data (bottom).
Figure 2
Figure 2
Distributions of normalized classification errors for the prostate cancer datasets (Singh, top left; Chandran, top right; combined, bottom). Compared to the renal cancer datasets, these datasets do not deviate significantly from the null distribution.
Figure 3
Figure 3
ROC curves for detecting validated reference genes for renal cancer. Red lines are combined data and dashed lines are individual datasets. For all datasets, fold change (top right) tends to detect reference genes efficiently compared to t-test (top left) and bootstrap (bottom). Combining data using fold change and bootstrap slightly improves detection efficiency. This corresponds to an increase in area under the ROC curve.
Figure 4
Figure 4
ROC curves for detecting validated reference genes for prostate cancer. Red lines are combined data and dashed lines are individual datasets. Combined data does not improve efficiency of reference gene detection when using the t-test (top left) or fold change (top right) methods. The bootstrap method (bottom) slightly increases detection performance of the reference genes for combined data.
Figure 5
Figure 5
BSA (AUCs) of individual and combined renal cancer datasets for detecting reference genes. The red bars are AUCs of the combined data. T, FC, and BS correspond to t-test, fold change, and bootstrap, respectively. S, J, and C correspond to Schuetz, Jones, bootstrap (right bars), the relevance of ranking for the combined data is at least as good, if not better, than both individual datasets.
Figure 6
Figure 6
BSA (AUCs) of individual and combined prostate cancer datasets for detecting reference genes. The red bars are AUCs of the combined data. T, FC, and BS correspond to t-test, fold change, and bootstrap, respectively. S, Ch, and C correspond to Singh, Chandran, and combined data, respectively. The bootstrap combination method (right bars) outperforms both the t-test and fold change methods.
Figure 7
Figure 7
Venn Diagrams showing overlap of genes selected by T-test, fold change, and bootstrap error (p<0.01), The left panel shows results for data combination form the renal cancer group, and the right panel shows results from the prostate cancer group.
Figure 8
Figure 8
Venn Diagrams showing overlap of selected genes selected by the individual datasets, and then by the combined meta-analysis for the T-test, Fold Change, and Bootstrap.

Similar articles

Cited by

References

    1. Irizarry RA, et al. Multiple-laboratory comparison of microarray platforms. Nature Methods. 2005;2(5):345–350. - PubMed
    1. Patterson T, et al. Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project. Nature Biotechnology. 2006;24(9):1140–1150. - PubMed
    1. Vo TM, et al. Reproducibility of Differential Gene Detection Across Multiple Microarray Studies; Proc. 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS; 2007. - PubMed
    1. Xiong M, et al. Biomarker Identification by Feature Wrappers. Genome Research. 2001;11:1878–1887. - PMC - PubMed
    1. Troyanskaya O, et al. Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics. 2002;18(11):1454–1461. - PubMed

Publication types

MeSH terms

LinkOut - more resources