Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Apr 12:16:1176935117702389.
doi: 10.1177/1176935117702389. eCollection 2017.

A mixture copula Bayesian network model for multimodal genomic data

Affiliations

A mixture copula Bayesian network model for multimodal genomic data

Qingyang Zhang et al. Cancer Inform. .

Abstract

Gaussian Bayesian networks have become a widely used framework to estimate directed associations between joint Gaussian variables, where the network structure encodes the decomposition of multivariate normal density into local terms. However, the resulting estimates can be inaccurate when the normality assumption is moderately or severely violated, making it unsuitable for dealing with recent genomic data such as the Cancer Genome Atlas data. In the present paper, we propose a mixture copula Bayesian network model which provides great flexibility in modeling non-Gaussian and multimodal data for causal inference. The parameters in mixture copula functions can be efficiently estimated by a routine expectation-maximization algorithm. A heuristic search algorithm based on Bayesian information criterion is developed to estimate the network structure, and prediction can be further improved by the best-scoring network out of multiple predictions from random initial values. Our method outperforms Gaussian Bayesian networks and regular copula Bayesian networks in terms of modeling flexibility and prediction accuracy, as demonstrated using a cell signaling data set. We apply the proposed methods to the Cancer Genome Atlas data to study the genetic and epigenetic pathways that underlie serous ovarian cancer.

Keywords: Bayesian network; copula function; serous ovarian cancer; systems biology; the Cancer Genome Atlas.

PubMed Disclaimer

Conflict of interest statement

DECLARATION OF CONFLICTING INTERESTS: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1
Figure 1
Comparison of three Bayesian network models on Sachs et al.’s data: (a) the benchmark network; (b) network predicted by the GBN model; (c) network predicted by the Gaussian CBN model; (d) network predicted by the two-component Gaussian MCBN model.
Figure 2
Figure 2
Fitted marginals by a two-component Gaussian mixture for the abundance of proteins Akt (left) and Erk (right).
Figure 3
Figure 3
Dependence between proteins Art and Erk: (a) observations; (b) simulated samples from the GBN; (c) simulated samples from the Gaussian CBN; (d) simulated samples from two-component Gaussian MCBN.
Figure 4
Figure 4
Comparison of three undirected networks: (a) skeleton of the known network presented in Figure 1(a); (b) network consisting of the top 25 edges based on Pearson’s correlation coefficient; (c) network consisting of the top 25 edges based on Spearman’s correlation coefficient; (d) skeleton of the network predicted by the MCBN model presented in Figure 1(d).
Figure 5
Figure 5
Fitted marginals by a two-component Gaussian mixture for the expression level of gene TP53 (left) and SPARC (right).
Figure 6
Figure 6
Fitted marginals by a two-component Gaussian mixture for the promoter methylation level of gene BRCA1 (left) and NOTCH3 (right).
Figure 7
Figure 7
Predicted network by a two-component Gaussian MCBN model, containing the expression level of 50 genes (in light yellow), methylation level at 8 sites (in light green), and CNV at 15 sites (in light blue).
Figure 8
Figure 8
Dependence between the methylation level and expression level of gene C19orf53: (a) observations; (b) simulated samples from the two-component Gaussian MCBN.

Similar articles

Cited by

References

    1. Fu F, Zhou Q. Learning sparse causal Gaussian networks with experimental intervention: Regularization and coordinate descent. J Amer Stat Assoc. 2013;108(501):288–300.
    1. Friedman N, Linial M, Nachman I, et al. Using Bayesian networks to analyze expression data. J Computat Biol. 2000;7(3):601–20. - PubMed
    1. Xu Y, Zhang J, Yuan Y, et al. A Bayesian graphical model for integrative analysis of TCGA data; 2012 IEEE International Workshop on Genomic Signal Processing and Statistics; 2012. p. 31. - PMC - PubMed
    1. Zhang Q, Burdette J, Wang JP. Integrative network analysis of TCGA data for ovarian cancer. BMC Syst Biol. 2014;8(1338):1–18. - PMC - PubMed
    1. Ellis B, Wong WH. Learning causal Bayesian network structures from experimental data. J Amer Stat Assoc. 2008;103(482):778–789.

LinkOut - more resources