Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Mar 21;1(1):11.
doi: 10.1186/2043-9113-1-11.

A filter-based feature selection approach for identifying potential biomarkers for lung cancer

Affiliations

A filter-based feature selection approach for identifying potential biomarkers for lung cancer

In-Hee Lee et al. J Clin Bioinforma. .

Abstract

Background: Lung cancer is the leading cause of death from cancer in the world and its treatment is dependant on the type and stage of cancer detected in the patient. Molecular biomarkers that can characterize the cancer phenotype are thus a key tool in planning a therapeutic response. A common protocol for identifying such biomarkers is to employ genomic microarray analysis to find genes that show differential expression according to disease state or type. Data-mining techniques such as feature selection are often used to isolate, from among a large manifold of genes with differential expression, those specific genes whose differential expression patterns are of optimal value in phenotypic differentiation. One such technique, Biomarker Identifier (BMI), has been developed to identify features with the ability to distinguish between two data groups of interest, which is thus highly applicable for such studies.

Results: Microarray data with validated genes was used to evaluate the utility of BMI in identifying markers for lung cancer. This data set contains a set of 129 gene expression profiles from large-airway epithelial cells (60 samples from smokers with lung cancer and 69 from smokers without lung cancer) and 7 genes from this data have been confirmed to be differentially expressed by quantitative PCR. Using this data set, BMI was compared with various well-known feature selection methods and was found to be more successful than other methods in finding useful genes to classify cancerous samples. Also it is evident that genes selected by BMI (given the same number of genes and classification algorithms) showed better discriminative power than those from the original study. After pathway analysis on the selected genes by BMI, we have been able to correlate the selected genes with well-known cancer-related pathways.

Conclusions: Our results show that BMI can be used to analyze microarray data and to find useful genes for classifying samples. Pathway analysis suggests that BMI is successful in identifying biomarker-quality cancer-related genes from the data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The median ranks of validated genes in airway data set by various feature selection methods.

References

    1. Jemal A, Siegel R, Ward E, Hao Y, Xu J, Murray T, Thun MJ. Cancer statistics. CA Cancer J Clin. 2008;58:71–96. doi: 10.3322/CA.2007.0010. - DOI - PubMed
    1. Herbst RS, Heymach JV, Lippman SM. Lung cancer. New England Journal of Medicine. 2008;359:1367–1380. doi: 10.1056/NEJMra0802714. - DOI - PMC - PubMed
    1. Granville CA, Dennis PA. An overview of lung cancer genomics and proteomics. American Journal of Respiratory Cell and Molecular Biology. 2005;32:169–176. doi: 10.1165/rcmb.F290. - DOI - PubMed
    1. Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23:2507–2517. doi: 10.1093/bioinformatics/btm344. - DOI - PubMed
    1. Baumgartner C, Baumgartner D. Biomarker discovery, disease classification, and similarity query processing on high-throughput MS/MS data of inborn errors of metabolism. Journal of Biomolecular Screening. 2006;11:90–99. doi: 10.1177/1087057105280518. - DOI - PubMed

LinkOut - more resources