Selection and estimation for mixed graphical models

Shizhe Chen¹, Daniela M Witten¹, Ali Shojaie¹

Affiliations

PMID: 27625437
PMCID: PMC5018402
DOI: 10.1093/biomet/asu051

Selection and estimation for mixed graphical models

Shizhe Chen et al. Biometrika. 2015 Mar.

. 2015 Mar;102(1):47-64.

doi: 10.1093/biomet/asu051. Epub 2014 Dec 24.

Authors

Shizhe Chen¹, Daniela M Witten¹, Ali Shojaie¹

Affiliation

¹ Department of Biostatistics, University of Washington, Box 357232, Seattle, Washington 98195, U.S.A.

PMID: 27625437
PMCID: PMC5018402
DOI: 10.1093/biomet/asu051

Abstract

We consider the problem of estimating the parameters in a pairwise graphical model in which the distribution of each node, conditioned on the others, may have a different exponential family form. We identify restrictions on the parameter space required for the existence of a well-defined joint density, and establish the consistency of the neighbourhood selection approach for graph reconstruction in high dimensions when the true underlying graph is sparse. Motivated by our theoretical results, we investigate the selection of edges between nodes whose conditional distributions take different parametric forms, and show that efficiency can be gained if edge estimates obtained from the regressions of particular nodes are used to reconstruct the graph. These results are illustrated with examples of Gaussian, Bernoulli, Poisson and exponential distributions. Our theoretical findings are corroborated by evidence from simulation studies.

Keywords: Compatibility; Conditional likelihood; Exponential family; High dimensionality; Model selection consistency; Neighbourhood selection; Pairwise Markov random field.

PubMed Disclaimer

Figures

**Fig. 1**
The graph used to generate the data in § § 6·2–6·4, consisting of m = p/2 Gaussian or Poisson nodes, shown as circles, and m = p/2 Bernoulli nodes, shown as rectangles.

**Fig. 2**
Probability of successful neighbourhood recovery plotted as a function of scaled sample size n/{3 log(p)}, for the set-up of § 6·2. The curves are empirical probabilities of successful neighbourhood recovery for graphs with 60 (), 120 () or 240 nodes (), averaged over 100 independent datasets. The tuning parameter is set to 2·6{log(p)/n}^1/2. The title above each panel indicates the subgraph for which the recovery probability is displayed, and the first word in the title indicates the node type that was regressed in order to obtain the subgraph estimate. For instance, panel (b) displays probability curves for edges between Gaussian and Bernoulli nodes that are estimated from the ℓ₁-penalized linear regression of Gaussian nodes; panel (c) displays the same quantity, but estimated via an ℓ₁-penalized logistic regression of the Bernoulli nodes.

formula image — **Fig. 2**
Probability of successful neighbourhood recovery plotted as a function of scaled sample size n/{3 log(p)}, for the set-up of § 6·2. The curves are empirical probabilities of successful neighbourhood recovery for graphs with 60 (), 120 () or 240 nodes (), averaged over 100 independent datasets. The tuning parameter is set to 2·6{log(p)/n}^1/2. The title above each panel indicates the subgraph for which the recovery probability is displayed, and the first word in the title indicates the node type that was regressed in order to obtain the subgraph estimate. For instance, panel (b) displays probability curves for edges between Gaussian and Bernoulli nodes that are estimated from the ℓ₁-penalized linear regression of Gaussian nodes; panel (c) displays the same quantity, but estimated via an ℓ₁-penalized logistic regression of the Bernoulli nodes.

**Fig. 3**
Simulation results for the Gaussian-Bernoulli graph, as described in § 6·3. The number of correctly estimated edges is displayed as a function of the number of estimated edges, for a range of tuning parameter values in a graph with p = 40 and n = 200: (a) edges between nodes of the same type, Bernoulli-Bernoulli or Gaussian-Gaussian; (b) edges between Gaussian and Bernoulli nodes. In each panel the different curves represent the methods of the present paper (solid), Lee & Hastie (2015) (short-dashed), Cheng et al. (long-dashed), Fellinghauer et al. (2013) (dot-dashed), neighbourhood selection in the Gaussian graphical model (grey long-dashed), neighbourhood selection in the Ising model (grey short-dashed), and the graphical lasso (grey dot-dashed). The black triangle shows the average performance of our proposed method with tuning parameter selected by the Bayesian information criterion (see § 3·2).

**Fig. 4**
Summary of the simulation results for the Poisson-Bernoulli graph, as described in § 6·4. The number of correctly estimated edges is displayed as a function of the number of estimated edges, for a range of tuning parameter values in a graph with p = 80 nodes from n = 200 observations. The different curves represent the selection rule from § 4·2 with the true parameters (grey solid), the selection rule from § 4·2 with estimated parameters (short-dashed), the union rule (dot-dashed), the intersection rule (dotted), and the graphical random forest method of Fellinghauer et al. (2013) (long-dashed).

See this image and copyright information in PMC

References

1. Allen GI, Liu Z. Proc. IEEE Int. Conf. Bioinfo. Biomed. 2012. New York: Curran Associates; 2012. A log-linear graphical model for inferring genetic networks from high-throughput sequencing data; pp. 1–6.
1. Besag JE. Spatial interaction and the statistical analysis of lattice systems (with Discussion) J. R. Statist. Soc. B. 1974;36:192–236.
1. Bunea F. Honest variable selection in linear and logistic regression models via ℓ1 and + ℓ2 penalization. Electron. J. Statist. 2008;2:1153–1194.
1. Fellinghauer B, Bühlmann P, Ryffel M, von Rhein M, Reinhardt JD. Stable graphical model estimation with random forests for discrete, continuous, and mixed variables. Comp. Statist. Data Anal. 2013;64:132–142.
1. Finegold M, Drton M. Robust graphical modeling of gene networks using classical and alternative t-distributions. Ann. Appl. Statist. 2011;5:1057–1080.

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Selection and estimation for mixed graphical models

Affiliation

Selection and estimation for mixed graphical models

Authors

Affiliation

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources