Statistical tests and identifiability conditions for pooling and analyzing multisite datasets
- PMID: 29386387
- PMCID: PMC5816202
- DOI: 10.1073/pnas.1719747115
Statistical tests and identifiability conditions for pooling and analyzing multisite datasets
Abstract
When sample sizes are small, the ability to identify weak (but scientifically interesting) associations between a set of predictors and a response may be enhanced by pooling existing datasets. However, variations in acquisition methods and the distribution of participants or observations between datasets, especially due to the distributional shifts in some predictors, may obfuscate real effects when datasets are combined. We present a rigorous statistical treatment of this problem and identify conditions where we can correct the distributional shift. We also provide an algorithm for the situation where the correction is identifiable. We analyze various properties of the framework for testing model fit, constructing confidence intervals, and evaluating consistency characteristics. Our technical development is motivated by Alzheimer's disease (AD) studies, and we present empirical results showing that our framework enables harmonizing of protein biomarkers, even when the assays across sites differ. Our contribution may, in part, mitigate a bottleneck that researchers face in clinical research when pooling smaller sized datasets and may offer benefits when the subjects of interest are difficult to recruit or when resources prohibit large single-site studies.
Keywords: causal model; maximum mean discrepancy; meta-analysis; multisite analysis; multisource.
Copyright © 2018 the Author(s). Published by PNAS.
Conflict of interest statement
The authors declare no conflict of interest.
Figures
References
-
- Buerger K, et al. Validation of Alzheimer’s disease CSF and plasma biological markers: The multicentre reliability study of the pilot european Alzheimer’s disease neuroimaging initiative (E-ADNI) Exp Gerontol. 2009;44:579–585. - PubMed
-
- Vanderstichele H, et al. Standardization of preanalytical aspects of cerebrospinal fluid biomarker testing for Alzheimer’s disease diagnosis: A consensus paper from the Alzheimer’s biomarkers standardization initiative. Alzheimers Dement. 2012;8:65–73. - PubMed
-
- Dubois B, et al. Revising the definition of Alzheimer’s disease: A new lexicon. Lancet Neurol. 2010;9:1118–1127. - PubMed
-
- Carrillo MC, et al. Research and standardization in Alzheimer’s trials: Reaching international consensus. Alzheimers Dement. 2013;9:160–168. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
