Heterogeneous data integration methods for patient similarity networks

Jessica Gliozzo^{1

2

3}, Marco Mesiti^{1

3}, Marco Notaro^{1

3}, Alessandro Petrini^{1

3}, Alex Patak², Antonio Puertas-Gallardo², Alberto Paccanaro^{4

5}, Giorgio Valentini^{1

3

6

7}, Elena Casiraghi^{1

3}

Affiliations

¹ AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.
² European Commission, Joint Research Centre (JRC), Ispra (VA), Italy.
³ CINI, Infolife National Laboratory, Roma, Italy.
⁴ Department of Computer Science, Royal Holloway, University of London, Egham, TW20 0EX UK.
⁵ School of Applied Mathematics (EMAp), Fundação Getúlio Vargas, Rio de Janeiro Brazil.
⁶ DSRC UNIMI, Data Science Research Center, Milano, 20135, Italy.
⁷ ELLIS, European Laboratory for Learning and Intelligent Systems, Berlin, Germany.

PMID: 35679533
PMCID: PMC9294435
DOI: 10.1093/bib/bbac207

Review

Heterogeneous data integration methods for patient similarity networks

Jessica Gliozzo et al. Brief Bioinform. 2022.

. 2022 Jul 18;23(4):bbac207.

doi: 10.1093/bib/bbac207.

Authors

Affiliations

¹ AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.
² European Commission, Joint Research Centre (JRC), Ispra (VA), Italy.
³ CINI, Infolife National Laboratory, Roma, Italy.
⁴ Department of Computer Science, Royal Holloway, University of London, Egham, TW20 0EX UK.
⁵ School of Applied Mathematics (EMAp), Fundação Getúlio Vargas, Rio de Janeiro Brazil.
⁶ DSRC UNIMI, Data Science Research Center, Milano, 20135, Italy.
⁷ ELLIS, European Laboratory for Learning and Intelligent Systems, Berlin, Germany.

PMID: 35679533
PMCID: PMC9294435
DOI: 10.1093/bib/bbac207

Abstract

Patient similarity networks (PSNs), where patients are represented as nodes and their similarities as weighted edges, are being increasingly used in clinical research. These networks provide an insightful summary of the relationships among patients and can be exploited by inductive or transductive learning algorithms for the prediction of patient outcome, phenotype and disease risk. PSNs can also be easily visualized, thus offering a natural way to inspect complex heterogeneous patient data and providing some level of explainability of the predictions obtained by machine learning algorithms. The advent of high-throughput technologies, enabling us to acquire high-dimensional views of the same patients (e.g. omics data, laboratory data, imaging data), calls for the development of data fusion techniques for PSNs in order to leverage this rich heterogeneous information. In this article, we review existing methods for integrating multiple biomedical data views to construct PSNs, together with the different patient similarity measures that have been proposed. We also review methods that have appeared in the machine learning literature but have not yet been applied to PSNs, thus providing a resource to navigate the vast machine learning literature existing on this topic. In particular, we focus on methods that could be used to integrate very heterogeneous datasets, including multi-omics data as well as data derived from clinical information and medical imaging.

Keywords: biomedical applications; data fusion; multimodal data; patient similarity networks.

PubMed Disclaimer

Figures

**Figure 1**
Schema of the main taxonomies proposed in literature for categorizing multimodal integration methods. Considering the data integration flow, literature works identify two broad classes: ’horizontal integration approaches’ and ’vertical integration approaches’. ’Horizontal integration’ approaches fuse ’multisets’ (i.e. datasets where each view is acquired by the same source under different conditions) by independently applying the same process on each view and then pooling the individual results. On the other hand, ’vertical integration approaches’ fuse ’multimodal datasets’ (i.e. datasets composed by semantically different views) through more complex techniques, further categorized as ’hierarchical–vertical integration’ methods and ’parallel–vertical integration’ techniques. The former fuse data views following a ’hierarchy’ driven by biological a priori knowledge whereas the latter do not exploit knowledge-based dependencies between views. ’Parallel-vertical integration’ methods are the most diffused integration methods; they are further classified based on the phase when the data ’integration-step’ is performed with respect to the model construction (red-dashed box). Thus, methods are divided in (1) ’early approaches’, which integrate the data types before model construction, (2) ’late approaches’, which integrate the results of models independently built on each data view and (3) ’intermediate approaches’ where intermediate models are obtained from each view and subsequently integrated. Of note, the latter class of approaches is more dependent on the exploited learning model, which is the reason why they have been also classified as ’model-dependent’ methods opposed to ’model-agnostic’ methods (blue-dashed boxes). We refer interested readers to Appendix A.

**Figure 2**
High-level representation of PSN-fusion methods. (A) Given a set of matrices, each representing the patients vectors acquired from one source, proper similarity measures or kernel functions are used to build a set of unimodal PSNs (one PSN per data source or data type); (B) all the PSNs are then fused through either MKL methods, SNF methods or other PSN-fusion approaches; (C) the integrated PSN is processed either by unsupervised clustering algorithms for solving, e.g. patients’ subtype prediction tasks, or by supervised classifier models for, e.g. patients’ outcome prediction.

**Figure 3**
Input data-fusion. (A) During the preprocessing phase the data are integrated by either a PCA-based integrative model or a MF-based model. They estimate a shared latent space where the integrated, normalized point representations express the joint structure underlying all the data blocks plus, eventually, the individual structures characterizing each data block (e.g. JIVE [18], aJIVE [97], iNMF [98]); (B) a PSN model is then constructed on the integrated profiles by using a classic similarity measure; (C) a clustering or supervised classification model is applied to the computed PSN.

**Figure 4**
Output-fusion. (A) Unimodal PSNs are constructed for each data type or data source and (B) each one is individually processed to identify clusters or to classify unknown samples; subsequently, (C) a simple aggregation technique or a meta-model is used to obtain the fused/consensus clustering/classification result.

See this image and copyright information in PMC

References

1. Koenig IR, Fuchs O, Hansen G, et al. What is precision medicine? Eur Respir J 2017;50(4). - PubMed
1. Aronson SJ, Rehm HL. Building the foundation for genomics in precision medicine. Nature 2015;526(7573):336–42. - PMC - PubMed
1. Wang B, Mezlini AM, Demir F, et al. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods 2014;11(3):333. - PubMed
1. Kim D, Joung J-G, Sohn K-A, et al. Knowledge boosting: a graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction. J Am Med Inform Assoc 2015;22(1):109–20. - PMC - PubMed
1. Li L, Cheng W-Y, Glicksberg BS, et al. Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Sci Transl Med 2015;7(311):311ra174–4. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Heterogeneous data integration methods for patient similarity networks

Affiliations

Heterogeneous data integration methods for patient similarity networks

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources