Missing observations in regression: a conditional approach
- PMID: 36778961
- PMCID: PMC9905973
- DOI: 10.1098/rsos.220267
Missing observations in regression: a conditional approach
Abstract
This note presents an alternative to multiple imputation and other approaches to regression analysis in the presence of missing covariate data. Our recommendation, based on factorial and fractional factorial arrangements, is more faithful to ancillarity considerations of regression analysis and involves assessing the sensitivity of inference on each regression parameter to missingness in each of the explanatory variables. The ideas are illustrated on a medical example concerned with the success of hematopoietic stem cell transplantation in children, and on a sociological example concerned with socio-economic inequalities in educational attainment.
Keywords: EM algorithm; Hadamard matrix; ancillarity; fractional factorial; missing data; regression.
© 2023 The Authors.
Conflict of interest statement
We declare we have no competing interests.
Figures
Similar articles
-
Multiple imputation with missing data indicators.Stat Methods Med Res. 2021 Dec;30(12):2685-2700. doi: 10.1177/09622802211047346. Epub 2021 Oct 13. Stat Methods Med Res. 2021. PMID: 34643465 Free PMC article.
-
A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study.BMC Med Res Methodol. 2017 Jul 25;17(1):114. doi: 10.1186/s12874-017-0372-y. BMC Med Res Methodol. 2017. PMID: 28743256 Free PMC article.
-
A nonparametric multiple imputation approach for missing categorical data.BMC Med Res Methodol. 2017 Jun 6;17(1):87. doi: 10.1186/s12874-017-0360-2. BMC Med Res Methodol. 2017. PMID: 28587662 Free PMC article.
-
SuperMICE: An Ensemble Machine Learning Approach to Multiple Imputation by Chained Equations.Am J Epidemiol. 2022 Feb 19;191(3):516-525. doi: 10.1093/aje/kwab271. Am J Epidemiol. 2022. PMID: 34788362
-
Approaches for missing covariate data in logistic regression with MNAR sensitivity analyses.Biom J. 2020 Jul;62(4):1025-1037. doi: 10.1002/bimj.201900117. Epub 2020 Jan 20. Biom J. 2020. PMID: 31957905
References
-
- Fisher RA. 1925. Theory of statistical estimation. Proc. Camb. Philol. Soc. 22, 700-725. (10.1017/S0305004100009580) - DOI
-
- Efron BE. 1977. Discussion of ‘Maximum likelihood from incomplete data via the EM algorithm’ by Dempster, Liard and Rubin. J. R. Statist. Soc. B 39, 29. (10.1111/j.2517-6161.1977.tb01600.x) - DOI
-
- Efron BE. 1982. Maximum likelihood and decision theory. Ann. Statist. 10, 340-356. (10.1214/aos/1176345778) - DOI
-
- Sundberg R. 1974. Maximum likelihood theory for incomplete data from an exponential family. Scand. J. Statist. 1, 49-58.
-
- Dempster AP, Laird NM, Rubin DB. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Statist. Soc. B 39, 1-38. (10.1111/j.2517-6161.1977.tb01600.x) - DOI
Associated data
LinkOut - more resources
Full Text Sources
Other Literature Sources