Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb;1(2):143-152.
doi: 10.1038/s43588-021-00029-8. Epub 2021 Feb 22.

Similarity-driven multi-view embeddings from high-dimensional biomedical data

Affiliations

Similarity-driven multi-view embeddings from high-dimensional biomedical data

Brian B Avants et al. Nat Comput Sci. 2021 Feb.

Erratum in

Abstract

Diverse, high-dimensional modalities collected in large cohorts present new opportunities for the formulation and testing of integrative scientific hypotheses. Similarity-driven multi-view linear reconstruction (SiMLR) is an algorithm that exploits inter-modality relationships to transform large scientific datasets into smaller, more well-powered and interpretable low-dimensional spaces. SiMLR contributes an objective function for identifying joint signal, regularization based on sparse matrices representing prior within-modality relationships and an implementation that permits application to joint reduction of large data matrices. We demonstrate that SiMLR outperforms closely related methods on supervised learning problems in simulation data, a multi-omics cancer survival prediction dataset and multiple modality neuroimaging datasets. Taken together, this collection of results shows that SiMLR may be applied to joint signal estimation from disparate modalities and may yield practically useful results in a variety of application domains.

Keywords: ANTs; ANTsR; SiMLR; brain; code:R; depression; genotype; imaging genetics; multi-modality embedding.

PubMed Disclaimer

Conflict of interest statement

6 Competing Interests Statement The authors declare no competing interests.

Figures

Figure 1:
Figure 1:
An overview of SiMLR’s workflow: (a) Two statistically independent signals are shown here to represent the hidden latent signal potentially two components of a disease process; (b) The latent signal is manifested across three different modalities (each represented by an oval) where the joint component of the signal is represented in the overlap. (c) This three-view data is converted to three matrices XA,XB,XC; in this effort we focus on matrices with common number of subject here denoted by n and variable number of predictors (pA, pB, pC). (d) Sparse regularization matrices (GA,GB,GC) are constructed with user input of domain knowledge or via helper functions; (e) SiMLR iteratively optimizes the ability of the modalities to predict each other in leave one out fashion; (f) Sparse feature vectors emerge which can be interpreted as weighted averages over selected columns of the input matrices that maintain the original units of the data. These are used to compute embeddings in (g) and passed to downstream analyses. Alternatively, one could permute the SiMLR solution to gain empirical statistics on its solutions.
Figure 2:
Figure 2:
SiMLR simulation study results: sensitivity to noise and ability to recover signal. In each panel, (a-c), the SiMLR signal recovery performance (120 simulations) in terms of R squared is plotted against RGCCA and SGCCA performance. (a) Demonstrates performance of signal recovery of SiMLR with the CCA energy and ICA source separation method. (b) Demonstrates performance of signal recovery of SiMLR with the CCA energy and SVD source separation method. (c) Demonstrates performance of signal recovery of SiMLR with the regression energy and ICA source separation method. Plots in (d) show how well signal recovery (R squared) can be predicted from the amount of matrix corruption. In this case, ideally, matrix corruption would minimally impact performance; therefore, lower scores are better. The best fit line (computed by generalized additive model (GAM)) is shaded with 95% confidence intervals.
Figure 3:
Figure 3:
PTBP fully supervised brain age prediction: comparison to SGCCA. In each panel, we show the ability to predict chronological age from the brain. Confidence intervals are shown as gray shaded regions around a best-fit linear regression line. R squared for the predicted model fit is also shown. Performance ranking is provided on the figure’s right and is based on the mean absolute error between the predicted and real age.

References

    1. Cole JH, Marioni RE, Harris SE & Deary IJ Brain age and other bodily ‘ages’: implications for neuropsychiatry. (2019) doi: 10.1038/s41380-018-0098-1. - DOI - PMC - PubMed
    1. Wray NR et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nature Genetics (2018) doi: 10.1038/s41588-018-0090-3. - DOI - PMC - PubMed
    1. Habeck C, Stern Y & Alzheimer’s Disease Neuroimaging Initiative. Multivariate data analysis for neuroimaging data: overview and application to Alzheimer’s disease. Cell Biochem Biophys 58, 53–67 (2010). - PMC - PubMed
    1. Shamy JL et al. Volumetric correlates of spatiotemporal working and recognition memory impairment in aged rhesus monkeys. Cereb Cortex 21, 1559–1573 (2011). - PMC - PubMed
    1. McKeown MJ et al. Analysis of fMRI data by blind separation into independent spatial components. Hum Brain Mapp 6, 160–188 (1998). - PMC - PubMed

Methods-only References

    1. Eddelbuettel D & Balamuta JJ Extending R with C++: A Brief Introduction to Rcpp. American Statistician (2018) doi: 10.1080/00031305.2017.1375990. - DOI
    1. Avants B, Johnson H & Tustison N Neuroinformatics and the the insight toolkit. Frontiers in Neuroinformatics 9, (2015). - PMC - PubMed
    1. Avants B et al. A reproducible evaluation of ANTs similarity metric performance in brain image registration. NeuroImage 54, (2011). - PMC - PubMed
    1. Muschelli J et al. Neuroconductor: An R platform for medical imaging analysis. Biostatistics (2019) doi: 10.1093/biostatistics/kxx068. - DOI - PMC - PubMed
    1. Zou H, Hastie T & Tibshirani R Sparse principal component analysis. Journal of Computational and Graphical Statistics (2006) doi: 10.1198/106186006X113430. - DOI

LinkOut - more resources