Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Oct 15;21(20):3896-904.
doi: 10.1093/bioinformatics/bti631. Epub 2005 Aug 16.

Simple decision rules for classifying human cancers from gene expression profiles

Affiliations

Simple decision rules for classifying human cancers from gene expression profiles

Aik Choon Tan et al. Bioinformatics. .

Abstract

Motivation: Various studies have shown that cancer tissue samples can be successfully detected and classified by their gene expression patterns using machine learning approaches. One of the challenges in applying these techniques for classifying gene expression data is to extract accurate, readily interpretable rules providing biological insight as to how classification is performed. Current methods generate classifiers that are accurate but difficult to interpret. This is the trade-off between credibility and comprehensibility of the classifiers. Here, we introduce a new classifier in order to address these problems. It is referred to as k-TSP (k-Top Scoring Pairs) and is based on the concept of 'relative expression reversals'. This method generates simple and accurate decision rules that only involve a small number of gene-to-gene expression comparisons, thereby facilitating follow-up studies.

Results: In this study, we have compared our approach to other machine learning techniques for class prediction in 19 binary and multi-class gene expression datasets involving human cancers. The k-TSP classifier performs as efficiently as Prediction Analysis of Microarray and support vector machine, and outperforms other learning methods (decision trees, k-nearest neighbour and naïve Bayes). Our approach is easy to interpret as the classifier involves only a small number of informative genes. For these reasons, we consider the k-TSP method to be a useful tool for cancer classification from microarray gene expression data.

Availability: The software and datasets are available at http://www.ccbm.jhu.edu

Contact: actan@jhu.edu.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Description of the k-TSP algorithm.
Fig. 2
Fig. 2
Genes that distinguish ALL from AML. Each row corresponds to a gene and each column corresponds to a sample array. Genes labeled with an asterisk (*) were identified in Golub et al. (1999). This heat map is generated by using the matrix2png software (Pavlidis and Noble, 2003). The expression level for each gene is normalized across the samples such that the mean is 0 and the standard deviation (SD) is 1. Genes with expression levels greater than the mean are colored in red and those below the mean are colored in green. The scale indicates the number of SDs above or below the mean. In (a–c), the discriminative genes and decision rules in three cases are shown: (a) TSP Classifier, (b) k-TSP Classifier and (c) Decision tree (DT) classifier.
Fig. 3
Fig. 3
Hierarchical classification of leukemia subtypes ALL, AML and MLL, using k-TSP. Rows and columns correspond to genes and samples, respectively. Genes labeled with an asterisk (*) were previously identified as discriminating genes for this problem in Armstrong et al. (2002). The blue panel denotes the independent test samples. HC-k-TSP consists of sequentially applying two k-TSP decision rules: the first classifier h1 distinguishes ALL from {AML, MLL} based on three (top-scoring) pairs of genes and the second classifier h2 discriminates MLL from AML using nine pairs. The heat maps generated the same way as in Fig. 2.

Similar articles

Cited by

References

    1. Alizadeh AA, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000;403:503–511. - PubMed
    1. Alon U, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl Acad. Sci. USA. 1998;96:6745–6750. - PMC - PubMed
    1. Amit Y, Geman D. Shape quantization and recognition with randomized trees. IEEE Trans. Pattern Anal. Machine Intell. 1997;19:1300–1305.
    1. Armstrong S, et al. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 2002;30:41–47. - PubMed
    1. Beer DG, et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 2002;8:816–824. - PubMed

Publication types