Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Mar 13;115(11):2578-2583.
doi: 10.1073/pnas.1708283115. Epub 2018 Mar 12.

Training replicable predictors in multiple studies

Affiliations

Training replicable predictors in multiple studies

Prasad Patil et al. Proc Natl Acad Sci U S A. .

Abstract

This article considers replicability of the performance of predictors across studies. We suggest a general approach to investigating this issue, based on ensembles of prediction models trained on different studies. We quantify how the common practice of training on a single study accounts in part for the observed challenges in replicability of prediction performance. We also investigate whether ensembles of predictors trained on multiple studies can be combined, using unique criteria, to design robust ensemble learners trained upfront to incorporate replicability into different contexts and populations.

Keywords: cross-study validation; ensemble learning; machine learning; replicability; validation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
The architecture of a CSL, illustrated with six studies divided into three subsets, two SSLs, and general weights.
Fig. 2.
Fig. 2.
Ratios of validation rms errors (rmses) to the rmse of the Reg-a weighting strategy, averaged over 100 simulation iterations, as we vary the coefficient perturbation window. Top seven panels correspond to different choices of SSL; the colors correspond to different weighting schemes. Bottom displays average validation rmse of the best-performing scheme (indicated with color) for each SSL (indicated by letter) at each perturbation window.
Fig. 3.
Fig. 3.
Differential discrimination of alternative classifiers. For each classifier we compute the hazard ratio associated with a change of one unit in the score vector, as evaluated in the validation datasets. The vertical scale is the ratio of this performance measure to that of the Reg-a CSL. Colors indicate classes of learning strategies: White is weighted CSLs with weights addressing cross-study prediction, purple is CSLs with fixed weights, orange is merging and meta-analysis, and blue is a SSL trained on the TCGA dataset. Horizontal lines are at y=1 and at median performance of CS-Avg.

References

    1. Committee on Applied and Theoretical Statistics, Board on Mathematical Sciences and Their Applications, Division on Engineering and Physical Sciences, National Academies of Sciences, Engineering, and Medicine . In: Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results, Summary of a Workshop. Schwalbe M, editor. National Academies Press; Washington, DC: 2016. - PubMed
    1. Kenett RS, Shmueli G. Clarifying the terminology that describes scientific reproducibility. Nat Methods. 2015;12:699–699. - PubMed
    1. Open Source Collaboration et al. Estimating the reproducibility of psychological science. Science. 2015;349:aac4716. - PubMed
    1. Heller R, Bogomolov M, Benjamini Y. Deciding whether follow-up studies have replicated findings in a preliminary large-scale omics study. Proc Natl Acad Sci USA. 2014;111:16262–16267. - PMC - PubMed
    1. Simon R, Radmacher MD, Dobbin K, McShane LM. Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst. 2003;95:14–18. - PubMed

Publication types