Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar;102(1):47-64.
doi: 10.1093/biomet/asu051. Epub 2014 Dec 24.

Selection and estimation for mixed graphical models

Affiliations

Selection and estimation for mixed graphical models

Shizhe Chen et al. Biometrika. 2015 Mar.

Abstract

We consider the problem of estimating the parameters in a pairwise graphical model in which the distribution of each node, conditioned on the others, may have a different exponential family form. We identify restrictions on the parameter space required for the existence of a well-defined joint density, and establish the consistency of the neighbourhood selection approach for graph reconstruction in high dimensions when the true underlying graph is sparse. Motivated by our theoretical results, we investigate the selection of edges between nodes whose conditional distributions take different parametric forms, and show that efficiency can be gained if edge estimates obtained from the regressions of particular nodes are used to reconstruct the graph. These results are illustrated with examples of Gaussian, Bernoulli, Poisson and exponential distributions. Our theoretical findings are corroborated by evidence from simulation studies.

Keywords: Compatibility; Conditional likelihood; Exponential family; High dimensionality; Model selection consistency; Neighbourhood selection; Pairwise Markov random field.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The graph used to generate the data in § § 6·2–6·4, consisting of m = p/2 Gaussian or Poisson nodes, shown as circles, and m = p/2 Bernoulli nodes, shown as rectangles.
Fig. 2
Fig. 2
Probability of successful neighbourhood recovery plotted as a function of scaled sample size n/{3 log(p)}, for the set-up of § 6·2. The curves are empirical probabilities of successful neighbourhood recovery for graphs with 60 (formula image), 120 (formula image) or 240 nodes (formula image), averaged over 100 independent datasets. The tuning parameter is set to 2·6{log(p)/n}1/2. The title above each panel indicates the subgraph for which the recovery probability is displayed, and the first word in the title indicates the node type that was regressed in order to obtain the subgraph estimate. For instance, panel (b) displays probability curves for edges between Gaussian and Bernoulli nodes that are estimated from the ℓ1-penalized linear regression of Gaussian nodes; panel (c) displays the same quantity, but estimated via an ℓ1-penalized logistic regression of the Bernoulli nodes.
Fig. 3
Fig. 3
Simulation results for the Gaussian-Bernoulli graph, as described in § 6·3. The number of correctly estimated edges is displayed as a function of the number of estimated edges, for a range of tuning parameter values in a graph with p = 40 and n = 200: (a) edges between nodes of the same type, Bernoulli-Bernoulli or Gaussian-Gaussian; (b) edges between Gaussian and Bernoulli nodes. In each panel the different curves represent the methods of the present paper (solid), Lee & Hastie (2015) (short-dashed), Cheng et al. (long-dashed), Fellinghauer et al. (2013) (dot-dashed), neighbourhood selection in the Gaussian graphical model (grey long-dashed), neighbourhood selection in the Ising model (grey short-dashed), and the graphical lasso (grey dot-dashed). The black triangle shows the average performance of our proposed method with tuning parameter selected by the Bayesian information criterion (see § 3·2).
Fig. 4
Fig. 4
Summary of the simulation results for the Poisson-Bernoulli graph, as described in § 6·4. The number of correctly estimated edges is displayed as a function of the number of estimated edges, for a range of tuning parameter values in a graph with p = 80 nodes from n = 200 observations. The different curves represent the selection rule from § 4·2 with the true parameters (grey solid), the selection rule from § 4·2 with estimated parameters (short-dashed), the union rule (dot-dashed), the intersection rule (dotted), and the graphical random forest method of Fellinghauer et al. (2013) (long-dashed).

References

    1. Allen GI, Liu Z. Proc. IEEE Int. Conf. Bioinfo. Biomed. 2012. New York: Curran Associates; 2012. A log-linear graphical model for inferring genetic networks from high-throughput sequencing data; pp. 1–6.
    1. Besag JE. Spatial interaction and the statistical analysis of lattice systems (with Discussion) J. R. Statist. Soc. B. 1974;36:192–236.
    1. Bunea F. Honest variable selection in linear and logistic regression models via ℓ1 and + ℓ2 penalization. Electron. J. Statist. 2008;2:1153–1194.
    1. Fellinghauer B, Bühlmann P, Ryffel M, von Rhein M, Reinhardt JD. Stable graphical model estimation with random forests for discrete, continuous, and mixed variables. Comp. Statist. Data Anal. 2013;64:132–142.
    1. Finegold M, Drton M. Robust graphical modeling of gene networks using classical and alternative t-distributions. Ann. Appl. Statist. 2011;5:1057–1080.