Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 May;83(3):141-159.
doi: 10.1111/ahg.12297. Epub 2019 Jan 15.

Graph coloring for extracting discriminative genes in cancer data

Affiliations

Graph coloring for extracting discriminative genes in cancer data

Mohamed A Mahfouz et al. Ann Hum Genet. 2019 May.

Abstract

Background and objective: The major difficulty of the analysis of the input gene expression data in a microarray-based approach for an automated diagnosis of cancer is the large number of genes (high dimensionality) with many irrelevant genes (noise) compared to the very small number of samples. This research study tackles the dimensionality reduction challenge in this area.

Methods: This research study introduces a dimension-reduction technique termed graph coloring approach (GCA) for microarray data-based cancer classification based on analyzing the absolute correlation between gene-gene pairs and partitioning genes into several hubs using graph coloring. GCA starts by a gene-selection step in which top relevant genes are selected using a biserial correlation. Each time, a gene from an ordered list of top relevant genes is selected as the hub gene (representative) and redundant genes are added to its group; the process is repeated recursively for the remaining genes. A gene is considered redundant if its absolute correlation with the hub gene is greater than a controlling threshold. A suitable range for the threshold is estimated by computing a percentage graph for the absolute correlation between gene-gene pairs. Each value in the estimated range for the threshold can efficiently produce a new feature subset.

Results: GCA achieved significant improvement over several existing techniques in terms of higher accuracy and a smaller number of features. Also, genes selected by this method are relevant genes according to the information stored in scientific repositories.

Conclusions: The proposed dimension-reduction technique can help biologists accurately predict cancer in several areas of the body.

Keywords: cancer diagnosis; cancer-specific genetic network; classification; dimension reduction; gene-expression analysis; graph coloring; microarrays.

PubMed Disclaimer

LinkOut - more resources