Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Feb 4;6(2):e14579.
doi: 10.1371/journal.pone.0014579.

Optimization based tumor classification from microarray gene expression data

Affiliations

Optimization based tumor classification from microarray gene expression data

Onur Dagliyan et al. PLoS One. .

Abstract

Background: An important use of data obtained from microarray measurements is the classification of tumor types with respect to genes that are either up or down regulated in specific cancer types. A number of algorithms have been proposed to obtain such classifications. These algorithms usually require parameter optimization to obtain accurate results depending on the type of data. Additionally, it is highly critical to find an optimal set of markers among those up or down regulated genes that can be clinically utilized to build assays for the diagnosis or to follow progression of specific cancer types. In this paper, we employ a mixed integer programming based classification algorithm named hyper-box enclosure method (HBE) for the classification of some cancer types with a minimal set of predictor genes. This optimization based method which is a user friendly and efficient classifier may allow the clinicians to diagnose and follow progression of certain cancer types.

Methodology/principal findings: We apply HBE algorithm to some well known data sets such as leukemia, prostate cancer, diffuse large B-cell lymphoma (DLBCL), small round blue cell tumors (SRBCT) to find some predictor genes that can be utilized for diagnosis and prognosis in a robust manner with a high accuracy. Our approach does not require any modification or parameter optimization for each data set. Additionally, information gain attribute evaluator, relief attribute evaluator and correlation-based feature selection methods are employed for the gene selection. The results are compared with those from other studies and biological roles of selected genes in corresponding cancer type are described.

Conclusions/significance: The performance of our algorithm overall was better than the other algorithms reported in the literature and classifiers found in WEKA data-mining package. Since it does not require a parameter optimization and it performs consistently very high prediction rate on different type of data sets, HBE method is an effective and consistent tool for cancer type prediction with a small number of gene markers.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. The flowchart of the algorithm.
Figure 2
Figure 2. The illustrative two dimensional classification problem.
a) The two-dimensional four-classes illustrative example. Each color represents one class. b) The determination of boundaries for corresponding classes for all samples. c) The determination of problematic samples. d) The identification of representative samples (seeds) from each class using pure IP. e) Construction of hyper-boxes for problematic samples using MILP. f) Construction of hyper-boxes for non-problematic samples.
Figure 3
Figure 3. The final solution after the intersection elimination.

References

    1. Trevino V, Falciani F, Barrera-Saldana H. Dna microarrays: a powerful genomic tool for biomedical and clinical research. Mol Med. 2007;13:527–541. - PMC - PubMed
    1. Slonim DK. From patterns to pathways: gene expression data analysis comes of age. Nat Genet. 2002;32:502–508. - PubMed
    1. Schwarz G. Estimating the dimension of a model. Ann Statist. 1976;6:461–464.
    1. Kohavi G, John R. Wrappers for feature subset selection. Artif Intell. 1997;97:273–324.
    1. Wang Y, Tetko IV, Hall MA. Gene selection from microarray data for cancer classification a machine learning approach. Comp Biol Chem. 2005;29:37–46. - PubMed

Publication types

MeSH terms