Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug 13;17(1):7-21.
doi: 10.1515/ijb-2019-0061.

A machine learning-based approach for estimating and testing associations with multivariate outcomes

Affiliations
Free article

A machine learning-based approach for estimating and testing associations with multivariate outcomes

David Benkeser et al. Int J Biostat. .
Free article

Abstract

We propose a method for summarizing the strength of association between a set of variables and a multivariate outcome. Classical summary measures are appropriate when linear relationships exist between covariates and outcomes, while our approach provides an alternative that is useful in situations where complex relationships may be present. We utilize machine learning to detect nonlinear relationships and covariate interactions and propose a measure of association that captures these relationships. A hypothesis test about the proposed associative measure can be used to test the strong null hypothesis of no association between a set of variables and a multivariate outcome. Simulations demonstrate that this hypothesis test has greater power than existing methods against alternatives where covariates have nonlinear relationships with outcomes. We additionally propose measures of variable importance for groups of variables, which summarize each groups' association with the outcome. We demonstrate our methodology using data from a birth cohort study on childhood health and nutrition in the Philippines.

Keywords: canonical correlation; epidemiology; machine learning; multivariate outcomes; variable importance.

PubMed Disclaimer

References

    1. Hotelling, H. The most predictable criterion. J Educ Psychol 1935;26:139. https://doi.org/10.1037/h0058165.
    1. Hotelling, H. Relations between two sets of variates. Biometrika 1936;28:321–77. https://doi.org/10.1093/biomet/28.3-4.321.
    1. Wilks, SS. Certain generalizations in the analysis of variance. Biometrika 1932;24:471–94. https://doi.org/10.2307/2331979.
    1. Bartlett, M. The statistical significance of canonical correlations. Biometrika 1941;32:29–37. https://doi.org/10.1093/biomet/32.1.29.
    1. Pillai, KS. On the distribution of the largest or the smallest root of a matrix in multivariate analysis. Biometrika 1956;43:122–7. https://doi.org/10.2307/2333585.

Publication types

LinkOut - more resources