Training replicable predictors in multiple studies
- PMID: 29531060
- PMCID: PMC5856504
- DOI: 10.1073/pnas.1708283115
Training replicable predictors in multiple studies
Abstract
This article considers replicability of the performance of predictors across studies. We suggest a general approach to investigating this issue, based on ensembles of prediction models trained on different studies. We quantify how the common practice of training on a single study accounts in part for the observed challenges in replicability of prediction performance. We also investigate whether ensembles of predictors trained on multiple studies can be combined, using unique criteria, to design robust ensemble learners trained upfront to incorporate replicability into different contexts and populations.
Keywords: cross-study validation; ensemble learning; machine learning; replicability; validation.
Conflict of interest statement
The authors declare no conflict of interest.
Figures



References
-
- Committee on Applied and Theoretical Statistics, Board on Mathematical Sciences and Their Applications, Division on Engineering and Physical Sciences, National Academies of Sciences, Engineering, and Medicine . In: Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results, Summary of a Workshop. Schwalbe M, editor. National Academies Press; Washington, DC: 2016. - PubMed
-
- Kenett RS, Shmueli G. Clarifying the terminology that describes scientific reproducibility. Nat Methods. 2015;12:699–699. - PubMed
-
- Open Source Collaboration et al. Estimating the reproducibility of psychological science. Science. 2015;349:aac4716. - PubMed
-
- Simon R, Radmacher MD, Dobbin K, McShane LM. Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst. 2003;95:14–18. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources