Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Meta-Analysis
. 2010 Sep 27:11:483.
doi: 10.1186/1471-2105-11-483.

Asymmetric microarray data produces gene lists highly predictive of research literature on multiple cancer types

Affiliations
Meta-Analysis

Asymmetric microarray data produces gene lists highly predictive of research literature on multiple cancer types

Noor B Dawany et al. BMC Bioinformatics. .

Abstract

Background: Much of the public access cancer microarray data is asymmetric, belonging to datasets containing no samples from normal tissue. Asymmetric data cannot be used in standard meta-analysis approaches (such as the inverse variance method) to obtain large sample sizes for statistical power enrichment. Noting that plenty of normal tissue microarray samples exist in studies not involving cancer, we investigated the viability and accuracy of an integrated microarray analysis approach based on significance analysis of microarrays (merged SAM) using a collection of data from separate diseased and normal samples.

Results: We focused on five solid cancer types (colon, kidney, liver, lung, and pancreas), where available microarray data allowed us to compare meta-analysis and integrated approaches. Our results from the merged SAM significantly overlapped gene lists from the validated inverse-variance method. Both meta-analysis and merged SAM approaches successfully captured the aberrances in the cell cycle that commonly occur in the different cancer types. However, the integrated SAM analysis replicated the known cancer literature (excluding microarray studies) with much more accuracy than the meta-analysis.

Conclusion: The merged SAM test is a powerful, robust approach for combining data from similar platforms and for analyzing asymmetric datasets, including those with only normal or only cancer samples that cannot be utilized by meta-analysis methods. The integrated SAM approach can also be used in comparing global gene expression between various subtypes of cancer arising from the same tissue.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of Microarray Datasets Used and Dataset Inclusion Criteria: a) Distribution of all Affymetrix microarray data used based on the number of cancer versus normal samples in each dataset. Datasets used for IV1/SAM1 test are shown inside the ellipse. Additional datasets included in SAM2 only are located on the axes. b) Selection method used for the inclusion of Affymetrix datasets used for the analyses in this study.
Figure 2
Figure 2
Quantile-Quantile Plots: Quantile-quantile plots indicating the distribution of three randomly chosen arrays from different colon datasets based on RMA (left) and refRMA (right) normalization.
Figure 3
Figure 3
Enriched KEGG Pathways: A list of KEGG pathways, shown in pink, that appear to be statistically enriched according to the top 400 genes from IV1, IV2, SAM1 and SAM2 at a p-value cutoff of 0.05. Results are limited to pathways independently enriched in at least two of the tissues or in the combined test including all tissues.
Figure 4
Figure 4
Cell Cycle Pathway: Differentially expressed genes involved in the cell cycle are shown in pink. Genes are ranked among the top 400 genes according to at least one of the statistical approaches used (IV1, IV2, SAM1 and/or SAM2), based on analyses of all five tissues together.
Figure 5
Figure 5
Literature Search Results: Histogram representing p-values of the number of top-ranked genes with at least 1 PubMed abstract relating the genes to cancer research from a non-microarray study according to each of the three test procedures. P-values are calculated based on expected data from a hundred random gene lists obtained from the platform and similarly related to non-microarray cancer literature. IV1 results are shown in gray, IV2 in yellow, SAM1 in blue and SAM2 gene lists are in pink. The horizontal line represents a p-value cutoff of 0.0001. * P-values adjusted to maximum number of available top genes.
Figure 6
Figure 6
Workflow of the Analyses: Flowchart depicting the steps involved in each of the steps involved in each of the four analyses considered: IV1 (grey), IV2 (yellow), SAM1 (blue) and SAM2 (pink).

Similar articles

Cited by

References

    1. Zintzaras E, Ioannidis JP. Meta-analysis for ranked discovery datasets: theoretical framework and empirical demonstration for microarrays. Comput Biol Chem. 2008;32(1):38–46. doi: 10.1016/j.compbiolchem.2007.09.003. - DOI - PubMed
    1. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Edgar R. NCBI GEO: mining tens of millions of expression profiles--database and tools update. Nucleic Acids Res. 2007. pp. D760–765. - DOI - PMC - PubMed
    1. Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30(1):207–210. doi: 10.1093/nar/30.1.207. - DOI - PMC - PubMed
    1. Brazma A, Parkinson H, Sarkans U, Shojatalab M, Vilo J, Abeygunawardena N, Holloway E, Kapushesky M, Kemmeren P, Lara GG. et al.ArrayExpress--a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 2003;31(1):68–71. doi: 10.1093/nar/gkg091. - DOI - PMC - PubMed
    1. Rhodes DR, Barrette TR, Rubin MA, Ghosh D, Chinnaiyan AM. Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Res. 2002;62(15):4427–4433. - PubMed

Publication types

MeSH terms

LinkOut - more resources