Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008;9 Suppl 1(Suppl 1):S4.
doi: 10.1186/1471-2164-9-S1-S4.

Bayesian biclustering of gene expression data

Affiliations

Bayesian biclustering of gene expression data

Jiajun Gu et al. BMC Genomics. 2008.

Abstract

Background: Biclustering of gene expression data searches for local patterns of gene expression. A bicluster (or a two-way cluster) is defined as a set of genes whose expression profiles are mutually similar within a subset of experimental conditions/samples. Although several biclustering algorithms have been studied, few are based on rigorous statistical models.

Results: We developed a Bayesian biclustering model (BBC), and implemented a Gibbs sampling procedure for its statistical inference. We showed that Bayesian biclustering model can correctly identify multiple clusters of gene expression data. Using simulated data both from the model and with realistic characters, we demonstrated the BBC algorithm outperforms other methods in both robustness and accuracy. We also showed that the model is stable for two normalization methods, the interquartile range normalization and the smallest quartile range normalization. Applying the BBC algorithm to the yeast expression data, we observed that majority of the biclusters we found are supported by significant biological evidences, such as enrichments of gene functions and transcription factor binding sites in the corresponding promoter sequences.

Conclusions: The BBC algorithm is shown to be a robust model-based biclustering method that can discover biologically significant gene-condition clusters in microarray data. The BBC model can easily handle missing data via Monte Carlo imputation and has the potential to be extended to integrated study of gene transcription networks.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Simulated data with two biclusters and the results of the BBC analysis. Bayesian biclustering for simulated datasets. (a) A dataset with two non-overlapping clusters. (b)-(c) The two clusters found by the Bayesian biclustering model from (a). (d) A dataset with two clusters with common genes. (e)-(g) The three clusters found by the Bayesian biclustering model from (d). (h) A dataset with two clusters with both common samples and common genes. (i)-(k) The three clusters found by the Bayesian biclustering model from (h).
Figure 2
Figure 2
Datasets simulated according to the plaid model Datasets for comparison. (a) A dataset with one single cluster (b) A dataset with two clusters, of which both genes and samples overlap.
Figure 3
Figure 3
The Simulated dataset with realistic characters

References

    1. Cheng Y, Church GM. Biclustering of Expression Data. Proc Int Conf Intell Syst Mol Biol. 2000:93–103. - PubMed
    1. Segal E, Battle A, Koller D. Decomposing gene expression into cellular processes. In: Klein E, editor. In Proceedings of the Pacific Symposium on Biocomputing; USA. World Scientific; 2003. pp. 89–100. - PubMed
    1. Tseng G, Wong W. Tight clustering: a resampling-based approach for identifying stable and tight patterns in Data. Biometrics. 2005;61:10–16. doi: 10.1111/j.0006-341X.2005.031032.x. - DOI - PubMed
    1. Bergmann S, Ihmels J, Barkai N. Iterative signature algorithm for the analysis of large-scale gene expression data. Physical review E. 2003. p. 67. - PubMed
    1. Lazzaroni L, Owen A. Plaid Models for Gene Expression Data. Statistica Sinica. 2002;12:61–86.

Publication types

LinkOut - more resources