Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Sep;116(3):397-407.
doi: 10.1016/j.jbiosc.2013.03.010. Epub 2013 Apr 19.

Robust complementary hierarchical clustering for gene expression data analysis by β-divergence

Affiliations

Robust complementary hierarchical clustering for gene expression data analysis by β-divergence

Md Bahadur Badsha et al. J Biosci Bioeng. 2013 Sep.

Abstract

A hierarchical clustering (HC) algorithm is one of the most widely used unsupervised statistical techniques for analyzing microarray gene expression data. When applying the HC algorithm to the gene expression data to cluster individuals, most of the HC algorithms generate clusters based on the highly differentially expressed (DE) genes that have very similar expression patterns. These highly DE genes may sometimes be irrelevant in biological processes. The serious problem is that those irrelevant genes with high expressions potentially drown out the low expressed genes that have important biological functions. To overcome the problem, Nowak and Tibshirani proposed the complementary hierarchical clustering (CHC) (Biostatistics, 9, 467-483, 2008). However, it is not robust against outlying expression and often produces misleading results if there exist some contaminations in the gene expression data. Thus, we propose the robust CHC (RCHC) method to robustify the CHC with respect to outliers by maximizing the β-likelihood function for sequential extraction of a gene-set with proper groups of individuals. Note that the proposed method reduces to the CHC with the tuning parameter β → 0. A value of β plays a key role in the performance of the RCHC method, which controls the tradeoff between the robustness and efficiency of the estimators. Using simulation and real gene expression analysis, the RCHC method shows robust properties to gene expression clustering with respect to data contaminations, overcomes the problem of the CHC, and predicts critically important genes from breast cancer data.

Keywords: DNA microarray; Gene expression; Maximum β-likelihood; Relative gene importance; Robust complementary hierarchical clustering (RCHC); Robustness; Selection procedure of β.

PubMed Disclaimer

LinkOut - more resources