Improving prediction of linear regression models by integrating external information from heterogeneous populations: James-Stein estimators
- PMID: 39101548
- PMCID: PMC11299067
- DOI: 10.1093/biomtc/ujae072
Improving prediction of linear regression models by integrating external information from heterogeneous populations: James-Stein estimators
Abstract
We consider the setting where (1) an internal study builds a linear regression model for prediction based on individual-level data, (2) some external studies have fitted similar linear regression models that use only subsets of the covariates and provide coefficient estimates for the reduced models without individual-level data, and (3) there is heterogeneity across these study populations. The goal is to integrate the external model summary information into fitting the internal model to improve prediction accuracy. We adapt the James-Stein shrinkage method to propose estimators that are no worse and are oftentimes better in the prediction mean squared error after information integration, regardless of the degree of study population heterogeneity. We conduct comprehensive simulation studies to investigate the numerical performance of the proposed estimators. We also apply the method to enhance a prediction model for patella bone lead level in terms of blood lead level and other covariates by integrating summary information from published literature.
Keywords: James–Stein shrinkage; data integration; external summary information; meta-analysis; population heterogeneity; prediction mean squared error.
© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society.
Conflict of interest statement
None declared.
Figures




Similar articles
-
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23. Clin Orthop Relat Res. 2024. PMID: 39051924
-
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3. Cochrane Database Syst Rev. 2022. PMID: 35593186 Free PMC article.
-
A synthetic data integration framework to leverage external summary-level information from heterogeneous populations.Biometrics. 2023 Dec;79(4):3831-3845. doi: 10.1111/biom.13852. Epub 2023 Apr 4. Biometrics. 2023. PMID: 36876883 Free PMC article.
-
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340. Health Technol Assess. 2006. PMID: 16959170
-
Cycling infrastructure for reducing cycling injuries in cyclists.Cochrane Database Syst Rev. 2015 Dec 10;2015(12):CD010415. doi: 10.1002/14651858.CD010415.pub2. Cochrane Database Syst Rev. 2015. PMID: 26661390 Free PMC article.
References
-
- Baranchik A. J. (1970). A family of minimax estimators of the mean of a multivariate normal distribution. Annals of Mathematical Statistics, 41, 642–645.
-
- Boot T. (2020). Confidence regions for averaging estimators. https://econ.wisc.edu/wp-content/uploads/sites/89/2020/11/Boot-2020-Conf.... [Accessed June 2024].
-
- Burger D. E., Milder F. L., Morsillo P. R., Adams B. B., Hu H. (1990). Automated bone lead analysis by k-X-ray fluorescence for the clinical environment. Basic Life Sciences, 55, 287–92. - PubMed
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources