Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 1;80(3):ujae072.
doi: 10.1093/biomtc/ujae072.

Improving prediction of linear regression models by integrating external information from heterogeneous populations: James-Stein estimators

Affiliations

Improving prediction of linear regression models by integrating external information from heterogeneous populations: James-Stein estimators

Peisong Han et al. Biometrics. .

Abstract

We consider the setting where (1) an internal study builds a linear regression model for prediction based on individual-level data, (2) some external studies have fitted similar linear regression models that use only subsets of the covariates and provide coefficient estimates for the reduced models without individual-level data, and (3) there is heterogeneity across these study populations. The goal is to integrate the external model summary information into fitting the internal model to improve prediction accuracy. We adapt the James-Stein shrinkage method to propose estimators that are no worse and are oftentimes better in the prediction mean squared error after information integration, regardless of the degree of study population heterogeneity. We conduct comprehensive simulation studies to investigate the numerical performance of the proposed estimators. We also apply the method to enhance a prediction model for patella bone lead level in terms of blood lead level and other covariates by integrating summary information from published literature.

Keywords: James–Stein shrinkage; data integration; external summary information; meta-analysis; population heterogeneity; prediction mean squared error.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

FIGURE 1
FIGURE 1
Comparison of different estimators in prediction mean squared error (PMSE) relative to the PMSE of OLS estimator when using different external studies by varying the value of formula image of the external studies while holding the other parameters at the internal study values. formula image for the internal study and formula image for all 3 external studies. OLS: ordinary least square estimator; JS, JS+: JS estimator and the positive part JS estimator; CLS: constrained least square estimator; EB: empirical Bayes estimator; RR: ridge regression estimator; MJS, MJS+: multiple shrinkage JS estimator and the positive part multiple shrinkage JS estimator; OCW-CLS: optimal-covariance-weighted CLS estimator; OCW-EB: optimal-covariance-weighted EB estimator.
FIGURE 2
FIGURE 2
Comparison of different estimators in prediction mean squared error (PMSE) relative to the PMSE of OLS estimator, when using all 3 external studies together by varying the value of each parameter of the external studies while holding the other parameters at the internal study values. formula image for the internal study and formula image for all 3 external studies. OLS: ordinary least square estimator; MJS, MJS+: multiple shrinkage JS estimator and the positive part multiple shrinkage JS estimator; OCW-CLS: optimal-covariance-weighted CLS estimator; OCW-EB: optimal-covariance-weighted EB estimator.
FIGURE 3
FIGURE 3
Performance of formula image using external study 3 under 4 combinations of sample sizes, by varying the value of formula image of external study 3 while holding other parameters at the internal study values. For each value of formula image, the lower curve and higher curve represent the minimum and maximum of the prediction mean squared error ratios corresponding to 100 regenerated external study datasets, respectively, and the dot represents the mean.
FIGURE 4
FIGURE 4
The left plot is for formula image when using external study 3 only with formula image and varying formula image. The right plot is for formula image when using all 3 external studies with formula image, formula image and varying formula image. All external studies have the same parameter values as the internal study. For each value of formula image, the lower curve and higher curve represent the minimum and maximum of the prediction mean squared error (PMSE) ratios corresponding to 100 regenerated external study datasets, respectively, and the dot represents the mean. In the right plot, the bar positioned at formula image represents the minimum and maximum of the PMSE ratios for formula image using only external studies 1 and 2, and the dot represents the mean.

Similar articles

References

    1. Baranchik A. J. (1970). A family of minimax estimators of the mean of a multivariate normal distribution. Annals of Mathematical Statistics, 41, 642–645.
    1. Boot T. (2020). Confidence regions for averaging estimators. https://econ.wisc.edu/wp-content/uploads/sites/89/2020/11/Boot-2020-Conf.... [Accessed June 2024].
    1. Burger D. E., Milder F. L., Morsillo P. R., Adams B. B., Hu H. (1990). Automated bone lead analysis by k-X-ray fluorescence for the clinical environment. Basic Life Sciences, 55, 287–92. - PubMed
    1. Chatterjee N., Chen Y.-H., Maas P., Carroll R. J. (2016). Constrained maximum likelihood estimation for model calibration using summary-level information from external big data sources. Journal of the American Statistical Association, 111, 107–117. - PMC - PubMed
    1. Chen Z., Ning J., Shen Y., Qin J. (2021). Combining primary cohort data with external aggregate information without assuming comparability. Biometrics, 77, 1024–1036. - PMC - PubMed

LinkOut - more resources