Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021;116(534):605-618.
doi: 10.1080/01621459.2020.1775611. Epub 2020 Jul 24.

Bayesian Structure Learning in Multi-layered Genomic Networks

Affiliations

Bayesian Structure Learning in Multi-layered Genomic Networks

Min Jin Ha et al. J Am Stat Assoc. 2021.

Abstract

Integrative network modeling of data arising from multiple genomic platforms provides insight into the holistic picture of the interactive system, as well as the flow of information across many disease domains including cancer. The basic data structure consists of a sequence of hierarchically ordered datasets for each individual subject, which facilitates integration of diverse inputs, such as genomic, transcriptomic, and proteomic data. A primary analytical task in such contexts is to model the layered architecture of networks where the vertices can be naturally partitioned into ordered layers, dictated by multiple platforms, and exhibit both undirected and directed relationships. We propose a multi-layered Gaussian graphical model (mlGGM) to investigate conditional independence structures in such multi-level genomic networks in human cancers. We implement a Bayesian node-wise selection (BANS) approach based on variable selection techniques that coherently accounts for the multiple types of dependencies in mlGGM; this flexible strategy exploits edge-specific prior knowledge and selects sparse and interpretable models. Through simulated data generated under various scenarios, we demonstrate that BANS outperforms other existing multivariate regression-based methodologies. Our integrative genomic network analysis for key signaling pathways across multiple cancer types highlights commonalities and differences of p53 integrative networks and epigenetic effects of BRCA2 on p53 and its interaction with T68 phosphorylated CHK2, that may have translational utilities of finding biomarkers and therapeutic targets.

Keywords: Bayesian variable selection; Multi-layered Gaussian graphical models; Multi-level data integration.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Example network for q ordered layers. In our application, we consider four layers corresponding to DNA methylation, Copy number aberration, gene expression and protein expression (see Section 7).
Fig. 2
Fig. 2
An example of a chain graph with layers {1, 2}, {3, 4}. (a) Solid lines represent the data generating structure and dotted lines display how we parameterize undirected edges in our neighborhood selection framework. (b) Our working model.
Fig. 3
Fig. 3
ROC curves and MCC curves for graph structure learning for (p,n,q,pE) = (200,100,10,0.03).
Fig. 4
Fig. 4
UpSet plots showing relationships of mlGGMs across all 10 pathways between 7 cancer types. Each column-wise bar corresponds to the number of exclusively intersecting edges that are shared by the cancer types represented by the dark circles, but not of the others, and each row-wise bar displays the total number of edges for the corresponding cancer type.
Fig. 5
Fig. 5
Heatmap depicting connectivity score (CS) of the within- and between-layer sub-networks of the estimated mlGGMs across the 7 cancer types and 10 pathways. The scores are indicated on a low-to-high scale (grey-red-black). The standard deviations of the CS values across cancer types are displayed in the barplots.
Fig. 6
Fig. 6
Integrative sub-networks for the p53 protein (red circle), inferred from BANS for LUAD, LUSC, COAD, READ, UCEC, OV, and SKCM. Each connected component includes all nodes that are connected to the p53 protein by any lengths of paths, including both undirected and directed edges for each cancer. The colors of edges indicate the inferred signs of the edges: negative (red) and positive (blue). The sizes of nodes and the widths of edges are weighted by their degrees and posterior edge inclusion probabilities, respectively.

References

    1. Akbani R, Ng PKS, Werner HM, Shahmoradgoli M, Zhang F, Ju Z, Liu W, Yang J-Y, Yoshihara K, Li J et al. (2014), ‘A pan-cancer proteomic perspective on the cancer genome atlas’, Nature communications 5, 3887. - PMC - PubMed
    1. Anderson T (1984), ‘Multivariate statistical analysis’, Wiley and Sons, New York, NY.
    1. Andersson SA, Madigan D and Perlman MD (2001), ‘Alternative markov properties for chain graphs’, Scandinavian journal of statistics 28(1), 33–85.
    1. Armstrong H (2005), Bayesian estimation of decomposable Gaussian graphical models, PhD thesis, The University of New South Wales.
    1. Baladandayuthapani V, Talluri R, Ji Y, Coombes KR, Lu Y, Hennessy BT, Davies MA and Mallick BK (2014), ‘Bayesian sparse graphical models for classification with application to protein expression data’, The annals of applied statistics 8(3), 1443. - PMC - PubMed

LinkOut - more resources