Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Feb 23;6(1):28.
doi: 10.1038/s41746-023-00770-6.

Predictive models in emergency medicine and their missing data strategies: a systematic review

Affiliations
Review

Predictive models in emergency medicine and their missing data strategies: a systematic review

Emilien Arnaud et al. NPJ Digit Med. .

Abstract

In the field of emergency medicine (EM), the use of decision support tools based on artificial intelligence has increased markedly in recent years. In some cases, data are omitted deliberately and thus constitute "data not purposely collected" (DNPC). This accepted information bias can be managed in various ways: dropping patients with missing data, imputing with the mean, or using automatic techniques (e.g., machine learning) to handle or impute the data. Here, we systematically reviewed the methods used to handle missing data in EM research. A systematic review was performed after searching PubMed with the query "(emergency medicine OR emergency service) AND (artificial intelligence OR machine learning)". Seventy-two studies were included in the review. The trained models variously predicted diagnosis in 25 (35%) publications, mortality in 21 (29%) publications, and probability of admission in 21 (29%) publications. Eight publications (11%) predicted two outcomes. Only 15 (21%) publications described their missing data. DNPC constitute the "missing data" in EM machine learning studies. Although DNPC have been described more rigorously since 2020, the descriptions in the literature are not exhaustive, systematic or homogeneous. Imputation appears to be the best strategy but requires more time and computational resources. To increase the quality and the comparability of studies, we recommend inclusion of the TRIPOD checklist in each new publication, summarizing the machine learning process in an explicit methodological diagram, and always publishing the area under the receiver operating characteristics curve-even when it is not the primary outcome.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Missing data strategies applied before the final (predictive) algorithm.
A Missing data strategy using autonomous missing data preprocessing algorithms. B Missing data strategies using non autonomous preprocessing algorithm. C Missing data strategies using additional process.
Fig. 2
Fig. 2
Identification, screening and inclusion of studies via databases and registers.
Fig. 3
Fig. 3. Change over time in the frequency of use of missing data strategies.
NB: some publications used more than one strategy, and so the total can exceed 100%.

References

    1. Fernandes M, et al. Clinical decision support systems for triage in the emergency department using intelligent systems: a review. Artif. Intell. Med. 2020;102:101762. doi: 10.1016/j.artmed.2019.101762. - DOI - PubMed
    1. Rubin DB. Inference and missing data. Biometrika. 1976;63:581–592. doi: 10.1093/biomet/63.3.581. - DOI
    1. van Buuren, S. Flexible Imputation of Missing Data 2nd edn (CRC Press, 2018).
    1. Little, R. J. A. & Rubin, D. B. Statistical Analysis with Missing Data (John Wiley & Sons, 2019).
    1. Hong S, Lynn HS. Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction. BMC Med. Res. Methodol. 2020;20:199. doi: 10.1186/s12874-020-01080-1. - DOI - PMC - PubMed