. 2017 Jun;24(6):472-485.

doi: 10.1089/cmb.2016.0138. Epub 2016 Nov 11.

A MAD-Bayes Algorithm for State-Space Inference and Clustering with Application to Querying Large Collections of ChIP-Seq Data Sets

Chandler Zuo¹, Kailei Chen¹, Sündüz Keleş¹

Affiliations

PMID: 27835030
PMCID: PMC5467113
DOI: 10.1089/cmb.2016.0138

A MAD-Bayes Algorithm for State-Space Inference and Clustering with Application to Querying Large Collections of ChIP-Seq Data Sets

Chandler Zuo et al. J Comput Biol. 2017 Jun.

. 2017 Jun;24(6):472-485.

doi: 10.1089/cmb.2016.0138. Epub 2016 Nov 11.

Authors

Chandler Zuo¹, Kailei Chen¹, Sündüz Keleş¹

Affiliation

¹ Department of Statistics, Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison , Madison, Wisconsin.

PMID: 27835030
PMCID: PMC5467113
DOI: 10.1089/cmb.2016.0138

Abstract

Current analytic approaches for querying large collections of chromatin immunoprecipitation followed by sequencing (ChIP-seq) data from multiple cell types rely on individual analysis of each data set (i.e., peak calling) independently. This approach discards the fact that functional elements are frequently shared among related cell types and leads to overestimation of the extent of divergence between different ChIP-seq samples. Methods geared toward multisample investigations have limited applicability in settings that aim to integrate 100s to 1000s of ChIP-seq data sets for query loci (e.g., thousands of genomic loci with a specific binding site). Recently, Zuo et al. developed a hierarchical framework for state-space matrix inference and clustering, named MBASIC, to enable joint analysis of user-specified loci across multiple ChIP-seq data sets. Although this versatile framework estimates both the underlying state-space (e.g., bound vs. unbound) and also groups loci with similar patterns together, its Expectation-Maximization-based estimation structure hinders its applicability with large number of loci and samples. We address this limitation by developing MAP-based asymptotic derivations from Bayes (MAD-Bayes) framework for MBASIC. This results in a K-means-like optimization algorithm that converges rapidly and hence enables exploring multiple initialization schemes and flexibility in tuning. Comparison with MBASIC indicates that this speed comes at a relatively insignificant loss in estimation accuracy. Although MAD-Bayes MBASIC is specifically designed for the analysis of user-specified loci, it is able to capture overall patterns of histone marks from multiple ChIP-seq data sets similar to those identified by genome-wide segmentation methods such as ChromHMM and Spectacle.

Keywords: ChIP-Seq; MAD-Bayes; small-variance asymptotics; unified state-space inference and clustering.

PubMed Disclaimer

Conflict of interest statement

No competing financial interests exist.

Figures

<b>FIG. 1.</b> — **FIG. 1.**
Overview of the MBASIC modeling framework. Curves within each panel depict different replicates under the experimental conditions C1, C2, and C3. Loci A and D are in the same cluster.

<b>FIG. 2.</b> — **FIG. 2.**
**(a)** Run-time comparisons on a 64 bit machine with Intel Xeon 3.0 GHz processor and 64GB of RAM and eight cores. **(b)** State-space prediction error. **(c)** Clustering accuracy based on the adjusted Rand index. **(d)** Clustering assignments of the singletons when

formula image — **FIG. 2.**
**(a)** Run-time comparisons on a 64 bit machine with Intel Xeon 3.0 GHz processor and 64GB of RAM and eight cores. **(b)** State-space prediction error. **(c)** Clustering accuracy based on the adjusted Rand index. **(d)** Clustering assignments of the singletons when

<b>FIG. 3.</b> — **FIG. 3.**
**(a)** Comparison of clusters and state labels between MAD-Bayes, Spectacle, and ChromHMM. **(b)** Jaccard index between MAD-Bayes clusters and ChromHMM states. **(c)** Jaccard index between MAD-Bayes clusters and Spectacle states. The diagonal blocks indicate agreement between clusters and states; MAD-Bayes clusters and Spectacle states are ordered according to their overlap with the ChromHMM states. MAD-Bayes, MAP-based asymptotic derivations from Bayes.

<b>Appendix FIG. 1.</b> — **Appendix FIG. 1.**
A graphical interpretation of the conjugacy between and J. We use the K-means initialization to compute surrogate values for for a large collection of . The value that can yield J clusters in the global solution must satisfy When satisfies this condition, a line with slope passing through on the graph should be tangent to the trace of all values. Although using the surrogate values can lead to the curve connecting the values to be non-convex, making the solution for not hold for some J, we can use a convex approximation to the trace of so that a exists for each J. A simpler approach is to order from largest to smallest and requires the following condition for . . Algorithm 2 essentially applies this idea to select the values. Each J corresponds to a of value that satisfies the conjugacy inequality. The algorithm essentially tries to identify the range of that leads up to number of clusters.

<b>Appendix FIG. 2.</b> — **Appendix FIG. 2.**
Comparison of the clustering accuracy with the adjusted Rand index by excluding the singleton loci.

See this image and copyright information in PMC

References

1. Aldous D.J. 1983. Exchangeability and related topics, 1–198. In École d'Été de Probabilités de Saint-Flour XIII 1983. Ed: Hennequin P.L. Springer, Berlin; Heidelberg
1. Banerjee A. 2005. Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749
1. Bao Y., Vinciotti V., Wit E., et al. . 2013. Accounting for immunoprecipitation efficiencies in the statistical analysis of ChIP-seq data. BMC Bioinformatics 14, 169. - PMC - PubMed
1. Bao Y., Vinciotti V., Wit E., et al. . 2014. Joint modeling of ChIP-seq data via a Markov random field model. Biostatistics 15, 296–310 - PubMed
1. Bardet A.F., He Q., Zeitlinger J., and Stark A. 2012. A computational pipeline for comparative chip-seq analyses. Nat. Protoc. 7, 45–61 - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A MAD-Bayes Algorithm for State-Space Inference and Clustering with Application to Querying Large Collections of ChIP-Seq Data Sets

Affiliation

A MAD-Bayes Algorithm for State-Space Inference and Clustering with Application to Querying Large Collections of ChIP-Seq Data Sets

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources