Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Oct 15;29(20):2610-6.
doi: 10.1093/bioinformatics/btt425. Epub 2013 Aug 28.

Bayesian consensus clustering

Affiliations

Bayesian consensus clustering

Eric F Lock et al. Bioinformatics. .

Abstract

Motivation: In biomedical research a growing number of platforms and technologies are used to measure diverse but related information, and the task of clustering a set of objects based on multiple sources of data arises in several applications. Most current approaches to multisource clustering either independently determine a separate clustering for each data source or determine a single 'joint' clustering for all data sources. There is a need for more flexible approaches that simultaneously model the dependence and the heterogeneity of the data sources.

Results: We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These separate clusterings adhere loosely to an overall consensus clustering, and hence they are not independent. We describe a computationally scalable Bayesian framework for simultaneous estimation of both the consensus clustering and the source-specific clusterings. We demonstrate that this flexible approach is more robust than joint clustering of all data sources, and is more powerful than clustering each data source independently. We present an application to subtype identification of breast cancer tumor samples using publicly available data from The Cancer Genome Atlas.

Availability: R code with instructions and examples is available at http://people.duke.edu/%7Eel113/software.html.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Estimated formula image versus true α for 100 randomly generated simulations. For each simulation, the mean value formula image is shown with a 95% credible interval
Fig. 2.
Fig. 2.
Source-specific and overall clustering error for 100 simulations with M = 2 and M = 3 data sources, shown for joint clustering, separate clustering, dependent clustering, BCC and BCC using the true α. A LOESS curve displays clustering error as a function of α for each method
Fig. 3.
Fig. 3.
PCA plots for each data source. Sample points are colored by overall cluster; cluster 1 is black, cluster 2 is red and cluster 3 is blue. Symbols indicate source-specific cluster; cluster 1 is indicated by filled circles, cluster 2 is indicated by plus signs and cluster 3 is indicated by asterisks

References

    1. Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature. 2012;490:61–70. - PMC - PubMed
    1. Cleveland WS. Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc. 1979;74:829–836.
    1. Curtis C, et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature. 2012;486:346–352. - PMC - PubMed
    1. Dahl D. Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model. Cambridge, UK: Cambridge University Press; 2006.
    1. Duan Q, et al. Metasignatures identify two major subtypes of breast cancer. CPT Pharmacom. Syst. Pharmacol. 2013;3:e35. - PMC - PubMed

Publication types