Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 28;12(7):906.
doi: 10.3390/biom12070906.

Comparing Bayesian-Based Reconstruction Strategies in Topology-Based Pathway Enrichment Analysis

Affiliations

Comparing Bayesian-Based Reconstruction Strategies in Topology-Based Pathway Enrichment Analysis

Yajunzi Wang et al. Biomolecules. .

Abstract

The development of high-throughput omics technologies has enabled the quantification of vast amounts of genes and gene products in the whole genome. Pathway enrichment analysis (PEA) provides an intuitive solution for extracting biological insights from massive amounts of data. Topology-based pathway analysis (TPA) represents the latest generation of PEA methods, which exploit pathway topology in addition to lists of differentially expressed genes and their expression profiles. A subset of these TPA methods, such as BPA, BNrich, and PROPS, reconstruct pathway structures by training Bayesian networks (BNs) from canonical biological pathways, providing superior representations that explain causal relationships between genes. However, these methods have never been compared for their differences in the PEA and their different topology reconstruction strategies. In this study, we aim to compare the BN reconstruction strategies of the BPA, BNrich, PROPS, Clipper, and Ensemble methods and their PEA and performance on tumor and non-tumor classification based on gene expression data. Our results indicate that they performed equally well in distinguishing tumor and non-tumor samples (AUC > 0.95) yet with a varying ranking of pathways, which can be attributed to the different BN structures resulting from the different cyclic structure removal strategies. This can be clearly seen from the reconstructed JAK-STAT networks by different strategies. In a nutshell, BNrich, which relies on expert intervention to remove loops and cyclic structures, produces BNs that best fit the biological facts. The plausibility of the Clipper strategy can also be partially explained by intuitive biological rules and theorems. Our results may offer an informed reference for the proper method for a given data analysis task.

Keywords: Bayesian network; gene expression; network reconstruction; topology-based pathway analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
A bump chart summarizing the ranking of the top ten pathways enriched by any of the five BN reconstruction methods in HCC dataset. Numbers represent the PEA ranks of pathways produced by the corresponding BN reconstruction approach. Lines connecting the rank numbers provide an intuitive visualization of how pathways are changing in a ranking across five approaches.
Figure 2
Figure 2
Schematic illustration of five BN reconstruction strategies. A hypothetical gene network, constituting 7 nodes (genes) and 9 edges is synthesized as an example to present the principle of each BN reconstruction method. Three cyclic structures {A -> B -> D ->A}, {A -> C -> D -> A}, and {D->E->D} are introduced in this network. (A) PROPS method orders each edge randomly and adds edges one by one. The edges that lead to cyclic structures are abandoned. Here, edge ④, edge ⑤, and edge ⑥ are removed. (B) Clipper method excludes cyclic edges with the most undesired p-value from the linear regression significance test. Edge ②, edge ⑤, and edge ⑦ are the lowest-ranked in three cyclic structures thus removed. (C) BNrich method firstly removes cyclic structures according to intuitive biological rules, herein edge ③ and edge ⑥ due to them being opposite in the direction of signaling transmission. Additionally, BNrich employs LASSO to further simplify the network. The coefficient of edge ⑨ is shrunken to zeros during LASSO regression, thus removed. (D) Ensemble method breaks cycles and preserves its graph hierarchy as much as possible. Edges in a cyclic structure with the highest voting score are excluded, Herein edge ①, edge ④, and edge ⑥ are removed to break corresponding cycles. (E) BPA method orders nodes arbitrarily (here in alphabetical order) and adds new edges from smaller numbered nodes to larger numbered nodes. Parent nodes of cyclic structure are also connected to each of the nodes in the cycle. Firstly edge ③ and edge ⑤ are removed and edge ⑩ is added in cycle 1. Then edge ⑥ is removed and node A and node C, two parent nodes of node D, are connected to node E constituting edge ⑪ and ⑫.
Figure 3
Figure 3
JAK-STAT-SCOS cell model. In canonical JAK-STAT signaling, the binding of extracellular cytokines to their transmembrane receptors results in activation of the pathway in a JAK-mediated manner. Both JAK and cytokine receptors are transphosphorylated by JAKs in close proximity. Activated JAKs phosphorylate STATs, leading to STAT dimerization and eventual translocation into cell nucleus, where they regulate the expression of genes, including SOCS proteins that act as negative feedback inhibitors of JAK-STAT. In BN reconstruction, both BNrich and Ensemble removed all the negative feedback edges representing SOCS inhibition of JAK-STAK (edges in red), while Clipper retained around 1/3 of them. In addition, Clipper also removed some of the JAK-induced STAT phosphorylation (edges in purple). Proteins of the STAT family are present in homodimers or heterodimers. SOCS family can be merged in a single node.
Figure 4
Figure 4
Simplified schematic diagrams of JAK-STAT-SOCS signaling and Clipper-reconstructed BNs from the original and randomly shuffled gene expression data. (A) Core paths of JAK-STAT signal transduction and SOCS negative feedback inhibition; (B) reconstructed BN (edges in red) from gene expression data of paired tumor and non-tumor HCC patients; (C) reconstructed BN (edges in dark blue) from shuffled gene expressions across samples.
Figure 5
Figure 5
Schematic illustration of two superimposed PROPS (A) and BPA (B) randomly generated BNs of JAK-STAT-SOCS signaling. (A) Edges in red and blue represent those unique to the two PROPS-reconstructed BNs, respectively; (B) edges in orange and green represent those unique to the two BPA-reconstructed BNs, respectively. The low coincidence of the edges contained in the two networks can be observed, indicating the randomness of the BN structures by the two approaches.

Similar articles

Cited by

References

    1. García-Campos M.A., Espinal-Enríquez J., Hernández-Lemus E. Pathway Analysis: State of the Art. Front. Physiol. 2015;6:383. doi: 10.3389/fphys.2015.00383. - DOI - PMC - PubMed
    1. Khatri P., Draghici S., Ostermeier G.C., Krawetz S.A. Profiling gene expression using onto-express. Genomics. 2002;79:266–270. doi: 10.1006/geno.2002.6698. - DOI - PubMed
    1. Zeeberg B.R., Feng W., Wang G., Wang M.D., Fojo A.T., Sunshine M., Narasimhan S., Kane D.W., Reinhold W.C., Lababidi S., et al. GoMiner: A resource for biological interpretation of genomic and proteomic data. Genome Biol. 2003;4:R28. doi: 10.1186/gb-2003-4-4-r28. - DOI - PMC - PubMed
    1. Wang J., Duncan D., Shi Z., Zhang B. WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): Update 2013. Nucleic Acids Res. 2013;41:W77–W83. doi: 10.1093/nar/gkt439. - DOI - PMC - PubMed
    1. Subramanian A., Tamayo P., Mootha V.K., Mukherjee S., Ebert B.L., Gillette M.A., Paulovich A., Pomeroy S.L., Golub T.R., Lander E.S., et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA. 2005;102:15545–15550. doi: 10.1073/pnas.0506580102. - DOI - PMC - PubMed