. 2018 Jun 1;25(6):645-653.

doi: 10.1093/jamia/ocx133.

3D-MICE: integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data

Yuan Luo¹, Peter Szolovits², Anand S Dighe^{3

4}, Jason M Baron^{3

4}

Affiliations

¹ Department of Preventive Medicine, Northwestern University, Chicago, IL, USA.
² Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA.
³ Department of Pathology, Massachusetts General Hospital, Boston, MA, USA.
⁴ Harvard Medical School, Boston, MA, USA.

PMID: 29202205
PMCID: PMC7646951
DOI: 10.1093/jamia/ocx133

3D-MICE: integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data

Yuan Luo et al. J Am Med Inform Assoc. 2018.

. 2018 Jun 1;25(6):645-653.

doi: 10.1093/jamia/ocx133.

Authors

Yuan Luo¹, Peter Szolovits², Anand S Dighe^{3

4}, Jason M Baron^{3

4}

Affiliations

¹ Department of Preventive Medicine, Northwestern University, Chicago, IL, USA.
² Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA.
³ Department of Pathology, Massachusetts General Hospital, Boston, MA, USA.
⁴ Harvard Medical School, Boston, MA, USA.

PMID: 29202205
PMCID: PMC7646951
DOI: 10.1093/jamia/ocx133

Abstract

Objective: A key challenge in clinical data mining is that most clinical datasets contain missing data. Since many commonly used machine learning algorithms require complete datasets (no missing data), clinical analytic approaches often entail an imputation procedure to "fill in" missing data. However, although most clinical datasets contain a temporal component, most commonly used imputation methods do not adequately accommodate longitudinal time-based data. We sought to develop a new imputation algorithm, 3-dimensional multiple imputation with chained equations (3D-MICE), that can perform accurate imputation of missing clinical time series data.

Methods: We extracted clinical laboratory test results for 13 commonly measured analytes (clinical laboratory tests). We imputed missing test results for the 13 analytes using 3 imputation methods: multiple imputation with chained equations (MICE), Gaussian process (GP), and 3D-MICE. 3D-MICE utilizes both MICE and GP imputation to integrate cross-sectional and longitudinal information. To evaluate imputation method performance, we randomly masked selected test results and imputed these masked results alongside results missing from our original data. We compared predicted results to measured results for masked data points.

Results: 3D-MICE performed significantly better than MICE and GP-based imputation in a composite of all 13 analytes, predicting missing results with a normalized root-mean-square error of 0.342, compared to 0.373 for MICE alone and 0.358 for GP alone.

Conclusions: 3D-MICE offers a novel and practical approach to imputing clinical laboratory time series data. 3D-MICE may provide an additional tool for use as a foundation in clinical predictive analytics and intelligent clinical decision support.

PubMed Disclaimer

Figures

**Figure 1.**
Construction of the primary dataset. Shown are the exclusion criteria used to construct our dataset and the impact of each criterion.

**Figure 2.**
Schematic 3D-MICE in modeling temporal clinical laboratory data. Shown is a schematic of 3D-MICE.

**Figure 3.**
Comparison of mean, MICE, GP, and 3D-MICE imputation. Shown is the normalized percentile absolute deviation (nPAD) for MICE, GP, and 3D-MICE. Mean imputation is also shown for comparison with a trivial imputation method. Bars represent the 25th through 75th percentile nPAD, and horizontal lines in bars represent the 50th percentile nPAD. In 3 cases, the 75th percentile for mean imputation exceeded the range of the graph, as denoted by the ellipsis and the actual numerical value. “+” and “−” symbols denote cases where 3D-MICE performed better or worse than comparison methods, as described in the legend.

**Figure 4.**
Heatmap of cross-sectional and longitudinal correlation. Shown is the correlation between various test results in our dataset when measured at the same time for the same patient and when measured at successive time points for the same patient. Analytes with the suffix “_prior” represent results from the analyte one measurement prior (within the same patient-admission) to analytes shown without this suffix. The dendrogram to the left of the heatmap represents the relative similarity between variables.

**Figure 5.**
Accuracy of chloride and platelet predictions and confidence intervals, box plots. Shown is the distribution of predicted results (vertical axis) corresponding to each range of measured results (horizontal axis). Horizontal lines within each box represent median values, boxes represent interpercentile ranges, and dots represent outliers. N’s represent the number of measured values falling within each range. **(A)** Chloride and **(B)** platelets are presented as 2 representative analytes.

**Figure 6.**
Accuracy of chloride and platelet predictions and confidence intervals, scatter plots. Predicted values for **(A)** chloride and **(B)** platelets are plotted as a function of measured values. Point colors represent prediction interval width. Horizontal and vertical lines represent the upper and lower normal reference limits. Note that less accurate predictions (points farther from the dashed 45-degree line) tend to be less confident, as indicated by the wider prediction interval and redder color.

See this image and copyright information in PMC

Cited by

Imputation of Missing Data in Electronic Health Records Based on Patients' Similarities.
Jazayeri A, Liang OS, Yang CC. Jazayeri A, et al. J Healthc Inform Res. 2020 May 7;4(3):295-307. doi: 10.1007/s41666-020-00073-5. eCollection 2020 Sep. J Healthc Inform Res. 2020. PMID: 35415446 Free PMC article.
Missing Value Estimation using Clustering and Deep Learning within Multiple Imputation Framework.
Samad MD, Abrar S, Diawara N. Samad MD, et al. Knowl Based Syst. 2022 Aug 5;249:108968. doi: 10.1016/j.knosys.2022.108968. Epub 2022 May 10. Knowl Based Syst. 2022. PMID: 36159738 Free PMC article.
Exploiting mutual information for the imputation of static and dynamic mixed-type clinical data with an adaptive k-nearest neighbours approach.
Tavazzi E, Daberdaku S, Vasta R, Calvo A, Chiò A, Di Camillo B. Tavazzi E, et al. BMC Med Inform Decis Mak. 2020 Aug 20;20(Suppl 5):174. doi: 10.1186/s12911-020-01166-2. BMC Med Inform Decis Mak. 2020. PMID: 32819346 Free PMC article.
A Combined Interpolation and Weighted K-Nearest Neighbours Approach for the Imputation of Longitudinal ICU Laboratory Data.
Daberdaku S, Tavazzi E, Di Camillo B. Daberdaku S, et al. J Healthc Inform Res. 2020 Mar 2;4(2):174-188. doi: 10.1007/s41666-020-00069-1. eCollection 2020 Jun. J Healthc Inform Res. 2020. PMID: 35415441 Free PMC article.
Has the Flood Entered the Basement? A Systematic Literature Review about Machine Learning in Laboratory Medicine.
Ronzio L, Cabitza F, Barbaro A, Banfi G. Ronzio L, et al. Diagnostics (Basel). 2021 Feb 22;11(2):372. doi: 10.3390/diagnostics11020372. Diagnostics (Basel). 2021. PMID: 33671623 Free PMC article. Review.

See all "Cited by" articles

References

1. Winslow RL, Trayanova N, Geman D, Miller MI. Computational medicine: translating models to clinical care. Sci Translational Med. 2012;4158:158rv11–58rv11. - PMC - PubMed
1. Kohane IS. Ten things we have to do to achieve precision medicine. Science. 2015;3496243:37–38. - PubMed
1. Waljee AK, Mukherjee A, Singal AG et al. , Comparison of imputation methods for missing laboratory data in medicine. BMJ Open. 2013;38:e002847. - PMC - PubMed
1. Weber GM, Adams WG, Bernstam EV et al. , Biases introduced by filtering electronic health records for patients with “complete data.” J Am Med Inform Assoc. 2017;246:1134–41. - PMC - PubMed
1. Harel O, Zhou XH. Multiple imputation for the comparison of two screening tests in two-phase Alzheimer studies. Stat Med. 2007;2611:2370–88. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

3D-MICE: integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data

Affiliations

3D-MICE: integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources