Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Jul 18;23(4):bbac207.
doi: 10.1093/bib/bbac207.

Heterogeneous data integration methods for patient similarity networks

Affiliations
Review

Heterogeneous data integration methods for patient similarity networks

Jessica Gliozzo et al. Brief Bioinform. .

Abstract

Patient similarity networks (PSNs), where patients are represented as nodes and their similarities as weighted edges, are being increasingly used in clinical research. These networks provide an insightful summary of the relationships among patients and can be exploited by inductive or transductive learning algorithms for the prediction of patient outcome, phenotype and disease risk. PSNs can also be easily visualized, thus offering a natural way to inspect complex heterogeneous patient data and providing some level of explainability of the predictions obtained by machine learning algorithms. The advent of high-throughput technologies, enabling us to acquire high-dimensional views of the same patients (e.g. omics data, laboratory data, imaging data), calls for the development of data fusion techniques for PSNs in order to leverage this rich heterogeneous information. In this article, we review existing methods for integrating multiple biomedical data views to construct PSNs, together with the different patient similarity measures that have been proposed. We also review methods that have appeared in the machine learning literature but have not yet been applied to PSNs, thus providing a resource to navigate the vast machine learning literature existing on this topic. In particular, we focus on methods that could be used to integrate very heterogeneous datasets, including multi-omics data as well as data derived from clinical information and medical imaging.

Keywords: biomedical applications; data fusion; multimodal data; patient similarity networks.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schema of the main taxonomies proposed in literature for categorizing multimodal integration methods. Considering the data integration flow, literature works identify two broad classes: ’horizontal integration approaches’ and ’vertical integration approaches’. ’Horizontal integration’ approaches fuse ’multisets’ (i.e. datasets where each view is acquired by the same source under different conditions) by independently applying the same process on each view and then pooling the individual results. On the other hand, ’vertical integration approaches’ fuse ’multimodal datasets’ (i.e. datasets composed by semantically different views) through more complex techniques, further categorized as ’hierarchical–vertical integration’ methods and ’parallel–vertical integration’ techniques. The former fuse data views following a ’hierarchy’ driven by biological a priori knowledge whereas the latter do not exploit knowledge-based dependencies between views. ’Parallel-vertical integration’ methods are the most diffused integration methods; they are further classified based on the phase when the data ’integration-step’ is performed with respect to the model construction (red-dashed box). Thus, methods are divided in (1) ’early approaches’, which integrate the data types before model construction, (2) ’late approaches’, which integrate the results of models independently built on each data view and (3) ’intermediate approaches’ where intermediate models are obtained from each view and subsequently integrated. Of note, the latter class of approaches is more dependent on the exploited learning model, which is the reason why they have been also classified as ’model-dependent’ methods opposed to ’model-agnostic’ methods (blue-dashed boxes). We refer interested readers to Appendix A.
Figure 2
Figure 2
High-level representation of PSN-fusion methods. (A) Given a set of matrices, each representing the patients vectors acquired from one source, proper similarity measures or kernel functions are used to build a set of unimodal PSNs (one PSN per data source or data type); (B) all the PSNs are then fused through either MKL methods, SNF methods or other PSN-fusion approaches; (C) the integrated PSN is processed either by unsupervised clustering algorithms for solving, e.g. patients’ subtype prediction tasks, or by supervised classifier models for, e.g. patients’ outcome prediction.
Figure 3
Figure 3
Input data-fusion. (A) During the preprocessing phase the data are integrated by either a PCA-based integrative model or a MF-based model. They estimate a shared latent space where the integrated, normalized point representations express the joint structure underlying all the data blocks plus, eventually, the individual structures characterizing each data block (e.g. JIVE [18], aJIVE [97], iNMF [98]); (B) a PSN model is then constructed on the integrated profiles by using a classic similarity measure; (C) a clustering or supervised classification model is applied to the computed PSN.
Figure 4
Figure 4
Output-fusion. (A) Unimodal PSNs are constructed for each data type or data source and (B) each one is individually processed to identify clusters or to classify unknown samples; subsequently, (C) a simple aggregation technique or a meta-model is used to obtain the fused/consensus clustering/classification result.

Similar articles

Cited by

References

    1. Koenig IR, Fuchs O, Hansen G, et al. What is precision medicine? Eur Respir J 2017;50(4). - PubMed
    1. Aronson SJ, Rehm HL. Building the foundation for genomics in precision medicine. Nature 2015;526(7573):336–42. - PMC - PubMed
    1. Wang B, Mezlini AM, Demir F, et al. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods 2014;11(3):333. - PubMed
    1. Kim D, Joung J-G, Sohn K-A, et al. Knowledge boosting: a graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction. J Am Med Inform Assoc 2015;22(1):109–20. - PMC - PubMed
    1. Li L, Cheng W-Y, Glicksberg BS, et al. Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Sci Transl Med 2015;7(311):311ra174–4. - PMC - PubMed

Publication types