Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models

Nicholas J Horton¹, Ken P Kleinman

Affiliations

PMID: 17401454
PMCID: PMC1839993
DOI: 10.1198/000313007X172556

Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models

Nicholas J Horton et al. Am Stat. 2007 Feb.

. 2007 Feb;61(1):79-90.

doi: 10.1198/000313007X172556.

Authors

Nicholas J Horton¹, Ken P Kleinman

Affiliation

¹ Department of Mathematics and Statistics Smith College, Northampton, MA.

PMID: 17401454
PMCID: PMC1839993
DOI: 10.1198/000313007X172556

Abstract

Missing data are a recurring problem that can cause bias or lead to inefficient analyses. Development of statistical methods to address missingness have been actively pursued in recent years, including imputation, likelihood and weighting approaches. Each approach is more complicated when there are many patterns of missing values, or when both categorical and continuous random variables are involved. Implementations of routines to incorporate observations with incomplete variables in regression models are now widely available. We review these routines in the context of a motivating example from a large health services research dataset. While there are still limitations to the current implementations, and additional efforts are required of the analyst, it is feasible to incorporate partially observed values, and these methods should be utilized in practice.

PubMed Disclaimer

Figures

**Figure 1**
Monotone and non-monotone patterns of missingness (Obs=observed, M=missing)

**Figure 2**
Use of Likelihood based approach with EM algorithm to incorporate partially

**Figure 3**
Proposed guidelines for reporting missing covariate data (Burton and Altman 2004)

**Figure 4**
Description of missing data (using Stata misschk function)

See this image and copyright information in PMC

References

1. Allison PD. Multiple imputation for missing data: a cautionary tale. Sociological Methods and Research. 2000;28:301–309.
1. Allison PD. Missing data. SAGE University Papers; 2002.
1. Allison PD. Imputation of categorical variables with PROC MI. 2005. [accessed July 30, 2006]. http://www2.sas.com/proceedings/sugi30/113-30.pdf.
1. Barnard J, Meng XL. Applications of multiple imputation in medical studies: from AIDS to NHANES. Statistical Methods in Medical Research. 1999;8:17–36. - PubMed
1. Bernaards CA, Belin TR, Schafer JL. Robustness of a multivariate normal approximation for imputation of incomplete binary data. Statistics in Medicine (In press) - PubMed

Grants and funding

R01 MH054693/MH/NIMH NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models

Affiliation

Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models

Authors

Affiliation

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources