Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2004 Jan;164(1):9-16.
doi: 10.1016/S0002-9440(10)63090-8.

Multi-platform, multi-site, microarray-based human tumor classification

Affiliations
Comparative Study

Multi-platform, multi-site, microarray-based human tumor classification

Greg Bloom et al. Am J Pathol. 2004 Jan.

Abstract

The introduction of gene expression profiling has resulted in the production of rich human data sets with potential for deciphering tumor diagnosis, prognosis, and therapy. Here we demonstrate how artificial neural networks (ANNs) can be applied to two completely different microarray platforms (cDNA and oligonucleotide), or a combination of both, to build tumor classifiers capable of deciphering the identity of most human cancers. First, 78 tumors representing eight different types of histologically similar adenocarcinoma, were evaluated with a 32k cDNA microarray and correctly classified by a cDNA-based ANN, using independent training and test sets, with a mean accuracy of 83%. To expand our approach, oligonucleotide data derived from six independent performance sites, representing 463 tumors and 21 tumor types, were assembled, normalized, and scaled. An oligonucleotide-based ANN, trained on a random fraction of the tumors (n = 343), was 88% accurate in predicting known pathological origin of the remaining fraction of tumors (n = 120) not exposed to the training algorithm. Finally, a mixed-platform classifier using a combination of both cDNA and oligonucleotide microarray data from seven performance sites, normalized and scaled from a large and diverse tumor set (n = 539), produced similar results (85% accuracy) on independent test sets. Further validation of our classifiers was achieved by accurately (84%) predicting the known primary site of origin for an independent set of metastatic lesions (n = 50), resected from brain, lung, and liver, potentially addressing the vexing classification problems imposed by unknown primary cancers. These cDNA- and oligonucleotide-based classifiers provide a first proof of principle that data derived from multiple platforms and performance sites can be exploited to build multi-tissue tumor classifiers.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Hierarchical clustering of eight different types of adenocarcinoma. The Kruskal-Wallis H-test was used to identify those genes most correlated with each tumor type, selecting ∼700 genes from 30,849 distinct transcripts on the cDNA chip. Average linkage hierarchical clustering of spotted cDNA array expression data using a Pearson correlation coefficient distance matrix illustrates the problems with this approach to classification, which typically weights each gene equally. Even for ovarian cancer samples (yellow boxes), which are generally well classified, there are two outlying samples that are grouped within a set of diverse tumors. For other tissues of origin such as lung (pink boxes), the situation is worse. Similar results are obtained for samples assayed using Affymetrix GeneChips. Although hierarchical clustering can be used with weights for each gene, we have no a priori means of determining the appropriate weights. This is the rationale that underlies the use of the ANN in tumor classification.
Figure 2
Figure 2
Graphical depiction of classifier development separated into the four major stages. Data acquisition involves a literature search for suitable published microarray data and the collection of this and newly generated data into a microarray database. Normalization and scaling shows the three major steps in data preparation. Namely, calculation of an average Gene expression value across the reference sample for the two Affymetrix chip types, gene by gene scaling between Affymetrix chip types and the gene by gene scaling between Affymetrix chip types and the spotted microarray. A nonparametric statistical screening was then used to find a subset of genes correlative with tumor type. This set of genes was used to train and validate an ANN.
Figure 3
Figure 3
Analysis of the effect of removing genes from the oligonucleotide classifier on classifier accuracy. Genes were sequentially removed from the 2000 genes selected by the Kruskal-Wallis test, starting with the least significant to the most significant P values.

References

    1. Nakhleh RE, Zarbo RJ. Amended reports in surgical pathology and implications for diagnostic error detection and avoidance: a College of American Pathologists Q-probes study of 1,667,547 accessioned cases in 359 laboratories. Arch Pathol Lab Med. 1998;122:303–309. - PubMed
    1. Zarbo RJ. Monitoring anatomic pathology practice through quality assurance measures. Clin Lab Med. 1999;19:713–742. - PubMed
    1. van de Wouw AJ, Janssen-Heijnen ML, Coebergh JW, Hillen HF. Epidemiology of unknown primary tumours; incidence and population-based survival of 1285 patients in Southeast Netherlands, 1984–1992. Eur J Cancer. 2002;38:409–413. - PubMed
    1. Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA. 2001;98:13790–13795. - PMC - PubMed
    1. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–537. - PubMed

Publication types

MeSH terms

LinkOut - more resources