Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2001 Nov;11(11):1878-87.
doi: 10.1101/gr.190001.

Biomarker identification by feature wrappers

Affiliations

Biomarker identification by feature wrappers

M Xiong et al. Genome Res. 2001 Nov.

Abstract

Gene expression studies bridge the gap between DNA information and trait information by dissecting biochemical pathways into intermediate components between genotype and phenotype. These studies open new avenues for identifying complex disease genes and biomarkers for disease diagnosis and for assessing drug efficacy and toxicity. However, the majority of analytical methods applied to gene expression data are not efficient for biomarker identification and disease diagnosis. In this paper, we propose a general framework to incorporate feature (gene) selection into pattern recognition in the process to identify biomarkers. Using this framework, we develop three feature wrappers that search through the space of feature subsets using the classification error as measure of goodness for a particular feature subset being "wrapped around": linear discriminant analysis, logistic regression, and support vector machines. To effectively carry out this computationally intensive search process, we employ sequential forward search and sequential forward floating search algorithms. To evaluate the performance of feature selection for biomarker identification we have applied the proposed methods to three data sets. The preliminary results demonstrate that very high classification accuracy can be attained by identified composite classifiers with several biomarkers.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Maximum within-sample prediction accuracy as a function of number of genes for classifying colon tumors that can be achieved by LDA using SFS and SFFS search algorithms.
Figure 2
Figure 2
Expression levels of three genes with accession numbers H22579, Z50573, and R67343 in 62 colon tissue samples.
Figure 3
Figure 3
Maximum within-sample prediction accuracy which was evaluated from the total collection of 62 colon tissue samples and by LDA, LR, and SVM with two kernel functions: linear and polynomial of degree P = 3 learning methods using SFFS search algorithm.
Figure 4
Figure 4
Maximum average out-of-sample prediction accuracy over the leave-one-out cross-validation set of colon tissue samples, which was achieved by LDA, LR and SVM with two kernel functions: linear and polynomial of degree P = 3 function learning methods using SFFS search algorithm.

Comment in

Similar articles

Cited by

References

    1. Allgayer H, Heiss MM, Schildberg FW. Prognostic factors in gastric cancer. Br J Surg. 1997;84:1651–1664. - PubMed
    1. Alon U, Brakai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci. 1999;96:6745–6750. - PMC - PubMed
    1. Bennett DA, Waters MD. Applying biomarker research. Environ Health Perspect. 2000;108:907–910. - PMC - PubMed
    1. Brazma A, Vilo J. Gene expression data analysis. FEBS Lett. 2000;480:17–24. - PubMed
    1. Brien TP, Depowski PL, Sheeehan CE, Ross JS, McKenna BJ. Prognostic factors in gastric cancer. Mol Pathol. 1998;11:870–877. - PubMed

Publication types

LinkOut - more resources