Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 6;18(9):e1009767.
doi: 10.1371/journal.pcbi.1009767. eCollection 2022 Sep.

Multi-omics subtyping of hepatocellular carcinoma patients using a Bayesian network mixture model

Affiliations

Multi-omics subtyping of hepatocellular carcinoma patients using a Bayesian network mixture model

Polina Suter et al. PLoS Comput Biol. .

Abstract

Comprehensive molecular characterization of cancer subtypes is essential for predicting clinical outcomes and searching for personalized treatments. We present bnClustOmics, a statistical model and computational tool for multi-omics unsupervised clustering, which serves a dual purpose: Clustering patient samples based on a Bayesian network mixture model and learning the networks of omics variables representing these clusters. The discovered networks encode interactions among all omics variables and provide a molecular characterization of each patient subgroup. We conducted simulation studies that demonstrated the advantages of our approach compared to other clustering methods in the case where the generative model is a mixture of Bayesian networks. We applied bnClustOmics to a hepatocellular carcinoma (HCC) dataset comprising genome (mutation and copy number), transcriptome, proteome, and phosphoproteome data. We identified three main HCC subtypes together with molecular characteristics, some of which are associated with survival even when adjusting for the clinical stage. Cluster-specific networks shed light on the links between genotypes and molecular phenotypes of samples within their respective clusters and suggest targets for personalized treatments.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Bayesian network-based clustering workflow.
Multiple omics types, both binary and continuous, are allowed as input data types (left). After feature selection is performed, prior knowledge about interactions between nodes can be included via blacklisting and penalization matrices (middle). bnClustOmics performs unsupervised clustering based on the selected features, blacklist, and penalization matrices. The output (right) includes cluster assignments (encircled patient sample), cluster-specific networks, and posterior probabilities of all individual edges in these graphs. Here, three patient clusters are depicted and labeled ●, ▲, and ■.
Fig 2
Fig 2. Benchmarking of algorithms for unsupervised clustering of multi-omics data.
50 Bayesian network mixtures were generated for each simulation setting. For general clustering approaches, the dimension was reduced by applying PCA and running clustering on the first 5 principal components. All integrative multi-omics approaches were applied to the original data unless specified otherwise. CIMLRco denotes clustering results of the application of CIMLR to a subset of data consisting of observations of only continuous variables. NZk denotes the number of observations in one cluster, K the number of clusters, nc number of continuous nodes, nb number of binary nodes in networks. (A) K = 3, nc = 100, nb = 20, NZk=200 (B) K = 3, nc = 100, nb = 20, NZk=20 (C) nc = 100, nb = 20, NZk=20, K ∈ {3, 5, 7, 9}; distance between centers set to medium (D) K = 3, nc = 1000, nb = 100, NZk=20, algorithms were applied to the full data and a subset of data consisting of all binary nodes with non-zero standard deviation and 150 selected continuous nodes; distance between centers set to medium.
Fig 3
Fig 3. Structure fit.
50 datasets were generated from Bayesian network mixtures consisting of K = 4 components with number of observations NZk{150,100,50,20} corresponding to cluster 1 (red), cluster 2 (green), cluster 3 (turquoise) and cluster 4 (violet). To construct the penalization matrix (prior), we first defined the edges representing interactions from databases by taking the union of all edges in the ground truth structures. Afterward, we removed 10% of these edges, modeling false-negative interactions in databases (b = 0.1), and added 10% of false positives (a = 0.1). The entries of the penalization matrix corresponding to the defined set were not penalized; all other edges were penalized by a factor of two. The simulated datasets were clustered using bnClustOmics with and without the penalization matrix. Resulting MAP and consensus models corresponding to posterior thresholds of p ∈ {0.3, 0.5, 0.7, 0.9, 0.95, 0.99} were assessed using TPR and FDR. (B) Additional curves were added for cluster 4 visualizing results for simulated databases constructed using various levels of FDR(a) and FNR(b): a = 0.1, b = 0.1 (violet solid), a = 0.5, b = 0.1 (yellow) and a = 0.1, b = 0.5 (grey). (C) a = 0.8, b = 0.1 (yellow) and a = 0.1, b = 0.8 (grey). (D) Clustering accuracy: no database (white), a = 0.1, b = 0.1 (violet), a = 0.5, b = 0.1 (yellow), a = 0.1, b = 0.5 (grey).
Fig 4
Fig 4. Multi-omics clustering of the HCC dataset with bnClustOmics.
(A) BIC and AIC scores of models with different numbers of clusters. (B) Kaplan-Meier survival curves for patients in discovered clusters. (C) Mutational frequencies in discovered clusters. Only mutations with frequency ≥15% in at least one of the clusters are shown. (D) Pathway enrichment differences between clusters. (E) Venn diagrams showing the number of common and cluster-specific edges in the discovered MAP and consensus networks learned for cluster 1 (red), cluster 2 (green), cluster 3 (blue); edge directions were disregarded.
Fig 5
Fig 5. Mutated genes and their most common interaction partners in HCC networks learned by bnClustOmics.
Only those T, P, and PP nodes are shown that are differentially expressed/phosphorylated in at least one cluster or the whole dataset. Edges are shown based on their posterior probability: either if they have a high total posterior probability (sum across clusters is at least 1.2), or if they have a high posterior probability in at least one of the clusters (p > 0.9). Edge colors indicate in which cluster-specific networks the edges are present with a posterior probability p > 0.4: red(G1), green(G2), blue (G3), brown (G1 and G2), violet (G1 and G3), turquoise (G2 and G3), black (G1 and G2 and G3). Border colors of T, P, and PP nodes represent the differential expression status (color scheme is the same as edge colors). Solid edges denote either connections between two omics types of the same gene or interactions found in the STRING and Omnipath databases.
Fig 6
Fig 6. Neighborhoods of individual nodes in the networks learned by bnClustOmics.
Direct neighbors of nodes (A) GLUL-T (B) TERT-T (C) RB1-S37 (D) RB1_T356 (E) RB1-S249 (F) MAPK1_T185 in multi-omics networks discovered by bnClustOmics. Interactions are only shown between the central node and all of its direct neighbors with exception of (A) where we also show the connection between CTNNB1-M and AXIN2_S70.

References

    1. Wu Y, Liu Z, Xu X. Molecular subtyping of hepatocellular carcinoma: A step toward precision medicine. Cancer Communications. 2020;40(12):681–693. doi: 10.1002/cac2.12115 - DOI - PMC - PubMed
    1. Cai M, Li L. Subtype identification from heterogeneous TCGA datasets on a genomic scale by multi-view clustering with enhanced consensus. BMC Medical Genomics. 2017;10(S4). doi: 10.1186/s12920-017-0306-x - DOI - PMC - PubMed
    1. Kamoun A, Cancel-Tassin G, Fromont G, Elarouci N, Armenoult L, Ayadi M, et al.. Comprehensive molecular classification of localized prostate adenocarcinoma reveals a tumour subtype predictive of non-aggressive disease. Annals of Oncology. 2018;29(8):1814–1821. doi: 10.1093/annonc/mdy224 - DOI - PubMed
    1. Jiang YZ, Liu Y, Xiao Y, Hu X, Jiang L, Zuo WJ, et al.. Molecular subtyping and genomic profiling expand precision medicine in refractory metastatic triple-negative breast cancer: the FUTURE trial. Cell Research. 2020;31(2):178–186. doi: 10.1038/s41422-020-0375-9 - DOI - PMC - PubMed
    1. Nutt CL, Mani DR, Betensky RA, Tamayo P, Cairncross JG, Ladd C, et al.. Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 2003;63(7):1602–1607. - PubMed