Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics
- PMID: 39773481
- PMCID: PMC11617597
- DOI: 10.2196/53622
Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics
Erratum in
-
Correction: Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics.JMIR Med Inform. 2025 Feb 11;13:e71249. doi: 10.2196/71249. JMIR Med Inform. 2025. PMID: 39935013 Free PMC article. No abstract available.
Abstract
Background: Data from multiple organizations are crucial for advancing learning health systems. However, ethical, legal, and social concerns may restrict the use of standard statistical methods that rely on pooling data. Although distributed algorithms offer alternatives, they may not always be suitable for health frameworks.
Objective: This study aims to support researchers and data custodians in three ways: (1) providing a concise overview of the literature on statistical inference methods for horizontally partitioned data, (2) describing the methods applicable to generalized linear models (GLMs) and assessing their underlying distributional assumptions, and (3) adapting existing methods to make them fully usable in health settings.
Methods: A scoping review methodology was used for the literature mapping, from which methods presenting a methodological framework for GLM analyses with horizontally partitioned data were identified and assessed from the perspective of applicability in health settings. Statistical theory was used to adapt methods and derive the properties of the resulting estimators.
Results: From the review, 41 articles were selected and 6 approaches were extracted to conduct standard GLM-based statistical analysis. However, these approaches assumed evenly and identically distributed data across nodes. Consequently, statistical procedures were derived to accommodate uneven node sample sizes and heterogeneous data distributions across nodes. Workflows and detailed algorithms were developed to highlight information sharing requirements and operational complexity.
Conclusions: This study contributes to the field of health analytics by providing an overview of the methods that can be used with horizontally partitioned data by adapting these methods to the context of heterogeneous health data and clarifying the workflows and quantities exchanged by the methods discussed. Further analysis of the confidentiality preserved by these methods is needed to fully understand the risk associated with the sharing of summary statistics.
Keywords: GLMs; algorithms; data custodians; data science; distributed algorithms; distributed analysis; federated analysis; generalized linear models; horizontally partitioned data; learning health systems; review methods; scoping; searches; statistics; synthesis.
© Félix Camirand Lemyre, Simon Lévesque, Marie-Pier Domingue, Klaus Herrmann, Jean-François Ethier. Originally published in JMIR Medical Informatics (https://medinform.jmir.org).
Conflict of interest statement
Figures
References
-
- Sinha BK, Hartung J, Knapp G. Statistical Meta-Analysis with Applications. John Wiley & Sons; 2011. ISBN.9780470290897
-
- Gao Y, Liu W, Wang H, Wang X, Yan Y, Zhang R. A review of distributed statistical inference. Stat Theory Relat Fields. 2022 May 27;6(2):89–99. doi: 10.1080/24754269.2021.1974158. doi. - DOI
-
- Huo X, Cao S. Aggregated inference. WIREs Comp Stats. 2019 Jan;11(1):e1451. doi: 10.1002/wics.1451. doi. - DOI
-
- Agresti A. Foundations of Linear and Generalized Linear Models. John Wiley & Sons; 2015. ISBN.9781118730034
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
