Classifying gene expression profiles from pairwise mRNA comparisons

Donald Geman¹, Christian d'Avignon, Daniel Q Naiman, Raimond L Winslow

Affiliations

Affiliation

¹ Center for Cardiovascular Bioinformatics and Modeling, Whitaker Biomedical Engineering Institute and Department of Applied Mathematics and Statistics, Johns Hopkins University, USA. ge-man@jhu.edu

PMID: 16646797
PMCID: PMC1989150
DOI: 10.2202/1544-6115.1071

Classifying gene expression profiles from pairwise mRNA comparisons

Donald Geman et al. Stat Appl Genet Mol Biol. 2004.

. 2004:3:Article19.

doi: 10.2202/1544-6115.1071. Epub 2004 Aug 30.

Authors

Donald Geman¹, Christian d'Avignon, Daniel Q Naiman, Raimond L Winslow

Affiliation

¹ Center for Cardiovascular Bioinformatics and Modeling, Whitaker Biomedical Engineering Institute and Department of Applied Mathematics and Statistics, Johns Hopkins University, USA. ge-man@jhu.edu

PMID: 16646797
PMCID: PMC1989150
DOI: 10.2202/1544-6115.1071

Abstract

We present a new approach to molecular classification based on mRNA comparisons. Our method, referred to as the top-scoring pair(s) (TSP) classifier, is motivated by current technical and practical limitations in using gene expression microarray data for class prediction, for example to detect disease, identify tumors or predict treatment response. Accurate statistical inference from such data is difficult due to the small number of observations, typically tens, relative to the large number of genes, typically thousands. Moreover, conventional methods from machine learning lead to decisions which are usually very difficult to interpret in simple or biologically meaningful terms. In contrast, the TSP classifier provides decision rules which i) involve very few genes and only relative expression values (e.g., comparing the mRNA counts within a single pair of genes); ii) are both accurate and transparent; and iii) provide specific hypotheses for follow-up studies. In particular, the TSP classifier achieves prediction rates with standard cancer data that are as high as those of previous studies which use considerably more genes and complex procedures. Finally, the TSP classifier is parameter-free, thus avoiding the type of over-fitting and inflated estimates of performance that result when all aspects of learning a predictor are not properly cross-validated.

PubMed Disclaimer

Figures

**Figure 1**
The distribution of top scores for a random class label permutation analysis. The locations of the top score on the real **Breast** and **Leukemia** data sets are shown in red; the estimated p-values are 0.001 and 0 respectively. The top-score histogram for the **Prostate** data looks qualitatively the same as the one for **Leukemia**, and the maximum score achieved among all of the artificial data sets is 0.586; the score observed on the real data is Δ = 0.902.

**Figure 2**
Scatter plots for a top pair of genes for each study. The two classes are represented using red and blue, the axes represent the expression levels of the two genes and the dotted line y = x represents the decision boundary.

See this image and copyright information in PMC

References

1. Bicciato S, Pandin M, Didone G, Di Bello C. Pattern identification and classification in gene expression data using an autoassociative neural network model. Biotechnol. Bioeng. 2003;81(5):594–606. - PubMed
1. Bloom G, Tang IV, Boulware D, Kwong KY, Coppola D, Eschrich S, Quackenbush J, Yeatman TJ. Multi-platform, multi-site, microarray-based human tumor classification. Am. J. Pathol. 2004;164(1):9–16. - PMC - PubMed
1. Bø TH, Jonassen I. New feature subset selection procedures for classification of expression profiles. Genome Biology. 2002;3(4):research0017.1–0017.11. - PMC - PubMed
1. Boulestiex AL, Tutz G, Strimmer K. A CART-based approach to discover emerging patterns in microarray data. Bioinformatics. 2003;19(18):2465–2472. - PubMed
1. Bradstock KF, Kirk J, Grimsley PG, Kabral A, Hughes WG. Unusual immunophenotypes in acute leukemias: incidence and clinical correlations. Br. J. Haematol. 1989;72(4):512–518. - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Classifying gene expression profiles from pairwise mRNA comparisons

Affiliation

Classifying gene expression profiles from pairwise mRNA comparisons

Authors

Affiliation

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous