Imputation and variable selection in linear regression models with missing covariates
- PMID: 16011697
- DOI: 10.1111/j.1541-0420.2005.00317.x
Imputation and variable selection in linear regression models with missing covariates
Abstract
Across multiply imputed data sets, variable selection methods such as stepwise regression and other criterion-based strategies that include or exclude particular variables typically result in models with different selected predictors, thus presenting a problem for combining the results from separate complete-data analyses. Here, drawing on a Bayesian framework, we propose two alternative strategies to address the problem of choosing among linear regression models when there are missing covariates. One approach, which we call "impute, then select" (ITS) involves initially performing multiple imputation and then applying Bayesian variable selection to the multiply imputed data sets. A second strategy is to conduct Bayesian variable selection and missing data imputation simultaneously within one Gibbs sampling process, which we call "simultaneously impute and select" (SIAS). The methods are implemented and evaluated using the Bayesian procedure known as stochastic search variable selection for multivariate normal data sets, but both strategies offer general frameworks within which different Bayesian variable selection algorithms could be used for other types of data sets. A study of mental health services utilization among children in foster care programs is used to illustrate the techniques. Simulation studies show that both ITS and SIAS outperform complete-case analysis with stepwise variable selection and that SIAS slightly outperforms ITS.
Similar articles
-
Sequential BART for imputation of missing covariates.Biostatistics. 2016 Jul;17(3):589-602. doi: 10.1093/biostatistics/kxw009. Epub 2016 Mar 15. Biostatistics. 2016. PMID: 26980459 Free PMC article.
-
Nonlinear multiple imputation for continuous covariate within semiparametric Cox model: application to HIV data in Senegal.Stat Med. 2013 Nov 20;32(26):4651-65. doi: 10.1002/sim.5854. Epub 2013 May 28. Stat Med. 2013. PMID: 23712767
-
A two-step semiparametric method to accommodate sampling weights in multiple imputation.Biometrics. 2016 Mar;72(1):242-52. doi: 10.1111/biom.12413. Epub 2015 Sep 22. Biometrics. 2016. PMID: 26393409 Free PMC article.
-
Maximum likelihood analysis of generalized linear models with missing covariates.Stat Methods Med Res. 1999 Mar;8(1):37-50. doi: 10.1177/096228029900800104. Stat Methods Med Res. 1999. PMID: 10347859 Review.
-
Multiple imputation in health-care databases: an overview and some applications.Stat Med. 1991 Apr;10(4):585-98. doi: 10.1002/sim.4780100410. Stat Med. 1991. PMID: 2057657 Review.
Cited by
-
Penalized regression procedures for variable selection in the potential outcomes framework.Stat Med. 2015 May 10;34(10):1645-58. doi: 10.1002/sim.6433. Epub 2015 Jan 28. Stat Med. 2015. PMID: 25628185 Free PMC article.
-
On the Relation between Prediction and Imputation Accuracy under Missing Covariates.Entropy (Basel). 2022 Mar 9;24(3):386. doi: 10.3390/e24030386. Entropy (Basel). 2022. PMID: 35327897 Free PMC article.
-
How to apply variable selection machine learning algorithms with multiply imputed data: A missing discussion.Psychol Methods. 2023 Apr;28(2):452-471. doi: 10.1037/met0000478. Epub 2022 Feb 3. Psychol Methods. 2023. PMID: 35113633 Free PMC article.
-
Cost-effectiveness of habit-based advice for weight control versus usual care in general practice in the Ten Top Tips (10TT) trial: economic evaluation based on a randomised controlled trial.BMJ Open. 2018 Aug 13;8(8):e017511. doi: 10.1136/bmjopen-2017-017511. BMJ Open. 2018. PMID: 30104307 Free PMC article. Clinical Trial.
-
Negative impact of maternal antenatal depressive symptoms on neonate's behavioral characteristics.Eur Child Adolesc Psychiatry. 2020 Apr;29(4):515-526. doi: 10.1007/s00787-019-01367-9. Epub 2019 Jul 11. Eur Child Adolesc Psychiatry. 2020. PMID: 31297657
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous