. 2022 Sep 6;18(9):e1009767.

doi: 10.1371/journal.pcbi.1009767. eCollection 2022 Sep.

Multi-omics subtyping of hepatocellular carcinoma patients using a Bayesian network mixture model

Polina Suter^{1

2}, Eva Dazert³, Jack Kuipers^{1

2}, Charlotte K Y Ng^{2

4

5

6}, Tuyana Boldanova⁵, Michael N Hall³, Markus H Heim^{5

7}, Niko Beerenwinkel^{1

2}

Affiliations

¹ Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.
² SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.
³ Biozentrum, University of Basel, Basel, Switzerland.
⁴ Department for BioMedical Research (DBMR), University of Bern, Bern, Switzerland.
⁵ Department of Biomedicine, University Hospital Basel, University of Basel, Basel, Switzerland.
⁶ Institute of Medical Genetics and Pathology, University Hospital Basel, University of Basel, Basel, Switzerland.
⁷ Department of Gastroenterology and Hepatology, Clarunis, University Center for Gastrointestinal and Liver Diseases, Basel, Switzerland.

PMID: 36067230
PMCID: PMC9481159
DOI: 10.1371/journal.pcbi.1009767

Multi-omics subtyping of hepatocellular carcinoma patients using a Bayesian network mixture model

Polina Suter et al. PLoS Comput Biol. 2022.

. 2022 Sep 6;18(9):e1009767.

doi: 10.1371/journal.pcbi.1009767. eCollection 2022 Sep.

Authors

Polina Suter^{1

2}, Eva Dazert³, Jack Kuipers^{1

2}, Charlotte K Y Ng^{2

4

5

6}, Tuyana Boldanova⁵, Michael N Hall³, Markus H Heim^{5

7}, Niko Beerenwinkel^{1

2}

Affiliations

¹ Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.
² SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.
³ Biozentrum, University of Basel, Basel, Switzerland.
⁴ Department for BioMedical Research (DBMR), University of Bern, Bern, Switzerland.
⁵ Department of Biomedicine, University Hospital Basel, University of Basel, Basel, Switzerland.
⁶ Institute of Medical Genetics and Pathology, University Hospital Basel, University of Basel, Basel, Switzerland.
⁷ Department of Gastroenterology and Hepatology, Clarunis, University Center for Gastrointestinal and Liver Diseases, Basel, Switzerland.

PMID: 36067230
PMCID: PMC9481159
DOI: 10.1371/journal.pcbi.1009767

Abstract

Comprehensive molecular characterization of cancer subtypes is essential for predicting clinical outcomes and searching for personalized treatments. We present bnClustOmics, a statistical model and computational tool for multi-omics unsupervised clustering, which serves a dual purpose: Clustering patient samples based on a Bayesian network mixture model and learning the networks of omics variables representing these clusters. The discovered networks encode interactions among all omics variables and provide a molecular characterization of each patient subgroup. We conducted simulation studies that demonstrated the advantages of our approach compared to other clustering methods in the case where the generative model is a mixture of Bayesian networks. We applied bnClustOmics to a hepatocellular carcinoma (HCC) dataset comprising genome (mutation and copy number), transcriptome, proteome, and phosphoproteome data. We identified three main HCC subtypes together with molecular characteristics, some of which are associated with survival even when adjusting for the clinical stage. Cluster-specific networks shed light on the links between genotypes and molecular phenotypes of samples within their respective clusters and suggest targets for personalized treatments.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Bayesian network-based clustering workflow.**
Multiple omics types, both binary and continuous, are allowed as input data types (left). After feature selection is performed, prior knowledge about interactions between nodes can be included via blacklisting and penalization matrices (middle). bnClustOmics performs unsupervised clustering based on the selected features, blacklist, and penalization matrices. The output (right) includes cluster assignments (encircled patient sample), cluster-specific networks, and posterior probabilities of all individual edges in these graphs. Here, three patient clusters are depicted and labeled ●, ▲, and ■.

**Fig 2. Benchmarking of algorithms for unsupervised clustering of multi-omics data.**
50 Bayesian network mixtures were generated for each simulation setting. For general clustering approaches, the dimension was reduced by applying PCA and running clustering on the first 5 principal components. All integrative multi-omics approaches were applied to the original data unless specified otherwise. CIMLRco denotes clustering results of the application of CIMLR to a subset of data consisting of observations of only continuous variables. $N_{Z_{k}}$ denotes the number of observations in one cluster, K the number of clusters, n_c number of continuous nodes, n_b number of binary nodes in networks. (A) K = 3, n_c = 100, n_b = 20, $N_{Z_{k}} = 200$ (B) K = 3, n_c = 100, n_b = 20, $N_{Z_{k}} = 20$ (C) n_c = 100, n_b = 20, $N_{Z_{k}} = 20$ , K ∈ {3, 5, 7, 9}; distance between centers set to medium (D) K = 3, n_c = 1000, n_b = 100, $N_{Z_{k}} = 20$ , algorithms were applied to the full data and a subset of data consisting of all binary nodes with non-zero standard deviation and 150 selected continuous nodes; distance between centers set to medium.

**Fig 3. Structure fit.**
50 datasets were generated from Bayesian network mixtures consisting of K = 4 components with number of observations $N_{Z_{k}} \in {150, 100, 50, 20}$ corresponding to cluster 1 (red), cluster 2 (green), cluster 3 (turquoise) and cluster 4 (violet). To construct the penalization matrix (prior), we first defined the edges representing interactions from databases by taking the union of all edges in the ground truth structures. Afterward, we removed 10% of these edges, modeling false-negative interactions in databases (b = 0.1), and added 10% of false positives (a = 0.1). The entries of the penalization matrix corresponding to the defined set were not penalized; all other edges were penalized by a factor of two. The simulated datasets were clustered using bnClustOmics with and without the penalization matrix. Resulting MAP and consensus models corresponding to posterior thresholds of p ∈ {0.3, 0.5, 0.7, 0.9, 0.95, 0.99} were assessed using TPR and FDR. (B) Additional curves were added for cluster 4 visualizing results for simulated databases constructed using various levels of FDR(a) and FNR(b): a = 0.1, b = 0.1 (violet solid), a = 0.5, b = 0.1 (yellow) and a = 0.1, b = 0.5 (grey). (C) a = 0.8, b = 0.1 (yellow) and a = 0.1, b = 0.8 (grey). (D) Clustering accuracy: no database (white), a = 0.1, b = 0.1 (violet), a = 0.5, b = 0.1 (yellow), a = 0.1, b = 0.5 (grey).

**Fig 4. Multi-omics clustering of the HCC dataset with bnClustOmics.**
(A) BIC and AIC scores of models with different numbers of clusters. (B) Kaplan-Meier survival curves for patients in discovered clusters. (C) Mutational frequencies in discovered clusters. Only mutations with frequency ≥15% in at least one of the clusters are shown. (D) Pathway enrichment differences between clusters. (E) Venn diagrams showing the number of common and cluster-specific edges in the discovered MAP and consensus networks learned for cluster 1 (red), cluster 2 (green), cluster 3 (blue); edge directions were disregarded.

**Fig 5. Mutated genes and their most common interaction partners in HCC networks learned by bnClustOmics.**
Only those T, P, and PP nodes are shown that are differentially expressed/phosphorylated in at least one cluster or the whole dataset. Edges are shown based on their posterior probability: either if they have a high total posterior probability (sum across clusters is at least 1.2), or if they have a high posterior probability in at least one of the clusters (p > 0.9). Edge colors indicate in which cluster-specific networks the edges are present with a posterior probability p > 0.4: red(G₁), green(G₂), blue (G₃), brown (G₁ and G₂), violet (G₁ and G₃), turquoise (G₂ and G₃), black (G₁ and G₂ and G₃). Border colors of T, P, and PP nodes represent the differential expression status (color scheme is the same as edge colors). Solid edges denote either connections between two omics types of the same gene or interactions found in the STRING and Omnipath databases.

**Fig 6. Neighborhoods of individual nodes in the networks learned by bnClustOmics.**
Direct neighbors of nodes (A) *GLUL*-T (B) *TERT*-T (C) RB1-S37 (D) RB1_T356 (E) RB1-S249 (F) MAPK1_T185 in multi-omics networks discovered by bnClustOmics. Interactions are only shown between the central node and all of its direct neighbors with exception of (A) where we also show the connection between *CTNNB1*-M and AXIN2_S70.

See this image and copyright information in PMC

References

1. Wu Y, Liu Z, Xu X. Molecular subtyping of hepatocellular carcinoma: A step toward precision medicine. Cancer Communications. 2020;40(12):681–693. doi: 10.1002/cac2.12115 - DOI - PMC - PubMed
1. Cai M, Li L. Subtype identification from heterogeneous TCGA datasets on a genomic scale by multi-view clustering with enhanced consensus. BMC Medical Genomics. 2017;10(S4). doi: 10.1186/s12920-017-0306-x - DOI - PMC - PubMed
1. Kamoun A, Cancel-Tassin G, Fromont G, Elarouci N, Armenoult L, Ayadi M, et al.. Comprehensive molecular classification of localized prostate adenocarcinoma reveals a tumour subtype predictive of non-aggressive disease. Annals of Oncology. 2018;29(8):1814–1821. doi: 10.1093/annonc/mdy224 - DOI - PubMed
1. Jiang YZ, Liu Y, Xiao Y, Hu X, Jiang L, Zuo WJ, et al.. Molecular subtyping and genomic profiling expand precision medicine in refractory metastatic triple-negative breast cancer: the FUTURE trial. Cell Research. 2020;31(2):178–186. doi: 10.1038/s41422-020-0375-9 - DOI - PMC - PubMed
1. Nutt CL, Mani DR, Betensky RA, Tamayo P, Cairncross JG, Ladd C, et al.. Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 2003;63(7):1602–1607. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multi-omics subtyping of hepatocellular carcinoma patients using a Bayesian network mixture model

Affiliations

Multi-omics subtyping of hepatocellular carcinoma patients using a Bayesian network mixture model

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Medical