Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2002;3(4):RESEARCH0017.
doi: 10.1186/gb-2002-3-4-research0017. Epub 2002 Mar 14.

New feature subset selection procedures for classification of expression profiles

Affiliations
Comparative Study

New feature subset selection procedures for classification of expression profiles

Trond Bø et al. Genome Biol. 2002.

Abstract

Background: Methods for extracting useful information from the datasets produced by microarray experiments are at present of much interest. Here we present new methods for finding gene sets that are well suited for distinguishing experiment classes, such as healthy versus diseased tissues. Our methods are based on evaluating genes in pairs and evaluating how well a pair in combination distinguishes two experiment classes. We tested the ability of our pair-based methods to select gene sets that generalize the differences between experiment classes and compared the performance relative to two standard methods. To assess the ability to generalize class differences, we studied how well the gene sets we select are suited for learning a classifier.

Results: We show that the gene sets selected by our methods outperform the standard methods, in some cases by a large margin, in terms of cross-validation prediction accuracy of the learned classifier. We show that on two public datasets, accurate diagnoses can be made using only 15-30 genes. Our results have implications for how to select marker genes and how many gene measurements are needed for diagnostic purposes.

Conclusion: When looking for differential expression between experiment classes, it may not be sufficient to look at each gene in a separate universe. Evaluating combinations of genes reveals interesting information that will not be discovered otherwise. Our results show that class prediction can be improved by taking advantage of this extra information.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Example of a good pair of genes in the colon dataset. The expression values give almost full separation between normal and tumor tissues. Along the x-axis (horizontal) are the expression values of M63391, and along the y-axis (vertical) are the expression values of H08393. The points marked 'x' are normal tissues, and the tumor tissues are marked by an 'o'. The expression values have been through the preprocessing steps described in the text. Also plotted is the DLD axis and the class-decision boundary for these two genes. Note that the DLD axis and the decision boundary are orthogonal, but as a result of different scaling on the axes it does not appear so in the plot.
Figure 2
Figure 2
Plots of prediction accuracy on the colon and ALL/AML dataset using four different FSS procedures and DLD prediction. (a) Colon dataset: LOOCV and DLD prediction. (b) Colon dataset: L-31-OCV and DLD prediction. (c) ALL/AML dataset: LOOCV and DLD prediction. (d) ALL/AML dataset: L-36-OCV and DLD prediction. Along the x-axis are the number of genes in the feature subsets, and average prediction accuracy is shown along the y-axis. The FSS procedures individual ranking (IR), all pairs (AP), forward selection (FS) and greedy pairs (GP) are explained in the text.

References

    1. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA. 1999;96:6745–6750. - PMC - PubMed
    1. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeeck M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–537. - PubMed
    1. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IL, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000;403:503–511. - PubMed
    1. Khan J, Wei JS, Rigner M, Saal LH, Ladanyi M, Wetsermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med. 2001;7:673–679. - PMC - PubMed
    1. Getz G, Levine E, Domany E. Coupled two-way clustering analysis of gene microarray data. Proc Natl Acad Sci USA. 2000;97:12079–12084. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources