. 2024 Sep 4;24(1):193.

doi: 10.1186/s12874-024-02302-6.

Gaps in the usage and reporting of multiple imputation for incomplete data: findings from a scoping review of observational studies addressing causal questions

Rheanna M Mainzer^{1

2}, Margarita Moreno-Betancur^{3

4}, Cattram D Nguyen^{3

4}, Julie A Simpson^{5

6}, John B Carlin^{3

5}, Katherine J Lee^{3

4}

Affiliations

¹ Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, 3052, Australia. rheanna.mainzer@unimelb.edu.au.
² Department of Paediatrics, The University of Melbourne, Parkville, Victoria, 3052, Australia. rheanna.mainzer@unimelb.edu.au.
³ Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, 3052, Australia.
⁴ Department of Paediatrics, The University of Melbourne, Parkville, Victoria, 3052, Australia.
⁵ Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, Victoria, 3052, Australia.
⁶ Nuffield Department of Medicine, University of Oxford, Oxford, UK.

PMID: 39232661
PMCID: PMC11373423
DOI: 10.1186/s12874-024-02302-6

Gaps in the usage and reporting of multiple imputation for incomplete data: findings from a scoping review of observational studies addressing causal questions

Rheanna M Mainzer et al. BMC Med Res Methodol. 2024.

. 2024 Sep 4;24(1):193.

doi: 10.1186/s12874-024-02302-6.

Authors

Rheanna M Mainzer^{1

2}, Margarita Moreno-Betancur^{3

4}, Cattram D Nguyen^{3

4}, Julie A Simpson^{5

6}, John B Carlin^{3

5}, Katherine J Lee^{3

4}

Affiliations

¹ Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, 3052, Australia. rheanna.mainzer@unimelb.edu.au.
² Department of Paediatrics, The University of Melbourne, Parkville, Victoria, 3052, Australia. rheanna.mainzer@unimelb.edu.au.
³ Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, 3052, Australia.
⁴ Department of Paediatrics, The University of Melbourne, Parkville, Victoria, 3052, Australia.
⁵ Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, Victoria, 3052, Australia.
⁶ Nuffield Department of Medicine, University of Oxford, Oxford, UK.

PMID: 39232661
PMCID: PMC11373423
DOI: 10.1186/s12874-024-02302-6

Abstract

Background: Missing data are common in observational studies and often occur in several of the variables required when estimating a causal effect, i.e. the exposure, outcome and/or variables used to control for confounding. Analyses involving multiple incomplete variables are not as straightforward as analyses with a single incomplete variable. For example, in the context of multivariable missingness, the standard missing data assumptions ("missing completely at random", "missing at random" [MAR], "missing not at random") are difficult to interpret and assess. It is not clear how the complexities that arise due to multivariable missingness are being addressed in practice. The aim of this study was to review how missing data are managed and reported in observational studies that use multiple imputation (MI) for causal effect estimation, with a particular focus on missing data summaries, missing data assumptions, primary and sensitivity analyses, and MI implementation.

Methods: We searched five top general epidemiology journals for observational studies that aimed to answer a causal research question and used MI, published between January 2019 and December 2021. Article screening and data extraction were performed systematically.

Results: Of the 130 studies included in this review, 108 (83%) derived an analysis sample by excluding individuals with missing data in specific variables (e.g., outcome) and 114 (88%) had multivariable missingness within the analysis sample. Forty-four (34%) studies provided a statement about missing data assumptions, 35 of which stated the MAR assumption, but only 11/44 (25%) studies provided a justification for these assumptions. The number of imputations, MI method and MI software were generally well-reported (71%, 75% and 88% of studies, respectively), while aspects of the imputation model specification were not clear for more than half of the studies. A secondary analysis that used a different approach to handle the missing data was conducted in 69/130 (53%) studies. Of these 69 studies, 68 (99%) lacked a clear justification for the secondary analysis.

Conclusion: Effort is needed to clarify the rationale for and improve the reporting of MI for estimation of causal effects from observational data. We encourage greater transparency in making and reporting analytical decisions related to missing data.

Keywords: Causal inference; Missing data; Missingness mechanism.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 2**
Dot plots and histograms showing the extent of missing data as a proportion of all participants in the analysis sample (see text for definition). A) no missing data (complete cases); B) missing values in the exposure; C) missing values in the outcome. Left panels: restricted to studies where the percentage could be established; right panels: restricted to studies where the exact percentage could not be established but a conservative bound on the percentage could be established

See this image and copyright information in PMC

Cited by

The Completeness of the Operating Room Data.
Nurmela P, Mykkänen M, Kinnunen UM. Nurmela P, et al. Methods Inf Med. 2024 Sep;63(3-04):137-144. doi: 10.1055/a-2566-7958. Epub 2025 Mar 26. Methods Inf Med. 2024. PMID: 40139222 Free PMC article.
Research Advance of Causal Inference in Clinical Medicine: A Bibliometrics Analysis via Citespace.
Qin G, Wei J, Sun Y, Du W. Qin G, et al. J Multidiscip Healthc. 2025 May 10;18:2603-2627. doi: 10.2147/JMDH.S516826. eCollection 2025. J Multidiscip Healthc. 2025. PMID: 40370682 Free PMC article. Review.

References

1. Hernán MA. The C-word: Scientific euphemisms do not improve causal inference from observational data. Am J Public Health. 2018;108(5):616–9. 10.2105/AJPH.2018.304337 - DOI - PMC - PubMed
1. Lederer DJ, Bell SC, Branson RD, Chalmers JD, Marshall R, Maslove DM, et al. Control of confounding and reporting of results in causal inference studies. Guidance for authors from editors of respiratory, sleep, and critical care journals. Ann Am Thorac Soc. 2019;16(1):22–8. 10.1513/AnnalsATS.201808-564PS - DOI - PubMed
1. Moreno-Betancur M, Lee KJ, Leacy FP, White IR, Simpson JA, Carlin JB. Canonical causal diagrams to guide the treatment of missing data in epidemiologic studies. Am J Epidemiol. 2018;187(12):2705–15. 10.1093/aje/kwy173 - DOI - PMC - PubMed
1. Rubin DB. Multiple imputation for nonresponse in surveys. Hoboken: Wiley; 2004.
1. Van Buuren S. Flexible imputation of missing data. Boca Raton: CRC Press; 2018.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- BioMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Gaps in the usage and reporting of multiple imputation for incomplete data: findings from a scoping review of observational studies addressing causal questions

Affiliations

Gaps in the usage and reporting of multiple imputation for incomplete data: findings from a scoping review of observational studies addressing causal questions

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources