Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 1;25(4):1122-1139.
doi: 10.1093/biostatistics/kxad033.

Similarity-based multimodal regression

Affiliations

Similarity-based multimodal regression

Andrew A Chen et al. Biostatistics. .

Erratum in

  • Correction.
    [No authors listed] [No authors listed] Biostatistics. 2024 Dec 31;26(1):kxae029. doi: 10.1093/biostatistics/kxae029. Biostatistics. 2024. PMID: 39186534 Free PMC article. No abstract available.

Abstract

To better understand complex human phenotypes, large-scale studies have increasingly collected multiple data modalities across domains such as imaging, mobile health, and physical activity. The properties of each data type often differ substantially and require either separate analyses or extensive processing to obtain comparable features for a combined analysis. Multimodal data fusion enables certain analyses on matrix-valued and vector-valued data, but it generally cannot integrate modalities of different dimensions and data structures. For a single data modality, multivariate distance matrix regression provides a distance-based framework for regression accommodating a wide range of data types. However, no distance-based method exists to handle multiple complementary types of data. We propose a novel distance-based regression model, which we refer to as Similarity-based Multimodal Regression (SiMMR), that enables simultaneous regression of multiple modalities through their distance profiles. We demonstrate through simulation, imaging studies, and longitudinal mobile health analyses that our proposed method can detect associations between clinical variables and multimodal data of differing properties and dimensionalities, even with modest sample sizes. We perform experiments to evaluate several different test statistics and provide recommendations for applying our method across a broad range of scenarios.

Keywords: distance statistics; mobile health; multimodal; neuroimaging.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Illustration of similarity-based multimodal regression. In SiMMR, distance matrices are computed separately on each modality, followed by representation in Euclidean space via classical multidimensional scaling (cMDS). SiMMR then concatenates these cMDS coordinates and performs inference using either Dempster’s trace (SiMMR-D) or Pillai’s trace after dimension reduction using principal components (SiMMR-PC).
Fig. 2
Fig. 2
Power results in simulations with exchangeable and AR(1) correlation structures for a sample size of 25. Each trace represents a different test statistic. Different simulation settings are distinguished by correlation structure across rows and by rank of the binary covariate effect across columns. Exchangeable refers to an exchangeable correlation structure with low or high correlation and AR(1) refers to a first-order autoregressive structure. MDMR, multivariate distance matrix regression; MC-MDMR, multiple MDMR statistics after Bonferroni correction; MMR, multivariate multiple regression using Pillai’s trace.
Fig. 3
Fig. 3
Rejection rates across resamples in applications of SiMMR to imaging and mobile health data. Each trace represents a different test statistic. Power curves for individual modalities are obtained through multivariate distance matrix regression (MDMR). PNC, Philadelphia Neurodevelopmental Cohort; FC, functional connectivity; rsFC, resting-state functional connectivity; SC, structural connectivity; EMA, ecological momentary assessment; MC-MDMR, multiple MDMR statistics after Bonferroni correction; MMR, multivariate multiple regression using Pillai’s trace.
Fig. 4
Fig. 4
SiMMR-PC results across number of PCs and related exploratory analyses in real data applications. (a) shows the rejection rate across resamples for SiMMR-PC test statistics across number of PCs compared to SiMMR-D (dashed line). (b) displays the distance correlation (DistCor) among modalities in each application using the full sample. (c) shows the percent of variation explained by PCs across the 1,000 resamplings of size 50 in each application. PNC, Philadelphia Neurodevelopmental Cohort; EMA, ecological momentary assessment.

References

    1. Abdi H., O’Toole A. J., Valentin D. and Edelman B. (2005). DISTATIS: the analysis of multiple distance matrices. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) - Workshops. NY, USA: IEEE, p. 42. doi: 10.1109/CVPR.2005.445. - DOI
    1. Anderson M. J. (2001). A new method for non-parametric multivariate analysis of variance. Austral Ecol. 26(1), 32–46. doi: 10.1111/j.1442-9993.2001.01070.pp.x. - DOI
    1. Arsigny V., Fillard P., Pennec X. and Ayache N. (2006). Log-Euclidean metrics for fast and simple calculus on diffusion tensors. Magn. Reson. Med . 56(2), 411–421. doi: 10.1002/mrm.20965. - DOI - PubMed
    1. Baum G. L., Cui Z., Roalf D. R., Ciric R., Betzel R. F., Larsen B., Cieslak M., Cook P. A., Xia C. H., Moore T. M., et al. (2020). Development of structure–function coupling in human brain networks during youth. Proc. Nat. Acad. Sci. USA 117(1), 771–778. doi: 10.1073/pnas.1912034117. - DOI - PMC - PubMed
    1. Cailliez F. (1983). The analytical solution of the additive constant problem. Psychometrika 48(2), 305–308. doi: 10.1007/BF02294026. - DOI