Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jun 1;25(6):645-653.
doi: 10.1093/jamia/ocx133.

3D-MICE: integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data

Affiliations

3D-MICE: integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data

Yuan Luo et al. J Am Med Inform Assoc. .

Abstract

Objective: A key challenge in clinical data mining is that most clinical datasets contain missing data. Since many commonly used machine learning algorithms require complete datasets (no missing data), clinical analytic approaches often entail an imputation procedure to "fill in" missing data. However, although most clinical datasets contain a temporal component, most commonly used imputation methods do not adequately accommodate longitudinal time-based data. We sought to develop a new imputation algorithm, 3-dimensional multiple imputation with chained equations (3D-MICE), that can perform accurate imputation of missing clinical time series data.

Methods: We extracted clinical laboratory test results for 13 commonly measured analytes (clinical laboratory tests). We imputed missing test results for the 13 analytes using 3 imputation methods: multiple imputation with chained equations (MICE), Gaussian process (GP), and 3D-MICE. 3D-MICE utilizes both MICE and GP imputation to integrate cross-sectional and longitudinal information. To evaluate imputation method performance, we randomly masked selected test results and imputed these masked results alongside results missing from our original data. We compared predicted results to measured results for masked data points.

Results: 3D-MICE performed significantly better than MICE and GP-based imputation in a composite of all 13 analytes, predicting missing results with a normalized root-mean-square error of 0.342, compared to 0.373 for MICE alone and 0.358 for GP alone.

Conclusions: 3D-MICE offers a novel and practical approach to imputing clinical laboratory time series data. 3D-MICE may provide an additional tool for use as a foundation in clinical predictive analytics and intelligent clinical decision support.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Construction of the primary dataset. Shown are the exclusion criteria used to construct our dataset and the impact of each criterion.
Figure 2.
Figure 2.
Schematic 3D-MICE in modeling temporal clinical laboratory data. Shown is a schematic of 3D-MICE.
Figure 3.
Figure 3.
Comparison of mean, MICE, GP, and 3D-MICE imputation. Shown is the normalized percentile absolute deviation (nPAD) for MICE, GP, and 3D-MICE. Mean imputation is also shown for comparison with a trivial imputation method. Bars represent the 25th through 75th percentile nPAD, and horizontal lines in bars represent the 50th percentile nPAD. In 3 cases, the 75th percentile for mean imputation exceeded the range of the graph, as denoted by the ellipsis and the actual numerical value. “+” and “−” symbols denote cases where 3D-MICE performed better or worse than comparison methods, as described in the legend.
Figure 4.
Figure 4.
Heatmap of cross-sectional and longitudinal correlation. Shown is the correlation between various test results in our dataset when measured at the same time for the same patient and when measured at successive time points for the same patient. Analytes with the suffix “_prior” represent results from the analyte one measurement prior (within the same patient-admission) to analytes shown without this suffix. The dendrogram to the left of the heatmap represents the relative similarity between variables.
Figure 5.
Figure 5.
Accuracy of chloride and platelet predictions and confidence intervals, box plots. Shown is the distribution of predicted results (vertical axis) corresponding to each range of measured results (horizontal axis). Horizontal lines within each box represent median values, boxes represent interpercentile ranges, and dots represent outliers. N’s represent the number of measured values falling within each range. (A) Chloride and (B) platelets are presented as 2 representative analytes.
Figure 6.
Figure 6.
Accuracy of chloride and platelet predictions and confidence intervals, scatter plots. Predicted values for (A) chloride and (B) platelets are plotted as a function of measured values. Point colors represent prediction interval width. Horizontal and vertical lines represent the upper and lower normal reference limits. Note that less accurate predictions (points farther from the dashed 45-degree line) tend to be less confident, as indicated by the wider prediction interval and redder color.

Similar articles

Cited by

References

    1. Winslow RL, Trayanova N, Geman D, Miller MI. Computational medicine: translating models to clinical care. Sci Translational Med. 2012;4158:158rv11–58rv11. - PMC - PubMed
    1. Kohane IS. Ten things we have to do to achieve precision medicine. Science. 2015;3496243:37–38. - PubMed
    1. Waljee AK, Mukherjee A, Singal AG et al. , Comparison of imputation methods for missing laboratory data in medicine. BMJ Open. 2013;38:e002847. - PMC - PubMed
    1. Weber GM, Adams WG, Bernstam EV et al. , Biases introduced by filtering electronic health records for patients with “complete data.” J Am Med Inform Assoc. 2017;246:1134–41. - PMC - PubMed
    1. Harel O, Zhou XH. Multiple imputation for the comparison of two screening tests in two-phase Alzheimer studies. Stat Med. 2007;2611:2370–88. - PubMed

Publication types