Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 2;20(1):252.
doi: 10.1186/s12911-020-01268-x.

Prediction of incident myocardial infarction using machine learning applied to harmonized electronic health record data

Affiliations

Prediction of incident myocardial infarction using machine learning applied to harmonized electronic health record data

Divneet Mandair et al. BMC Med Inform Decis Mak. .

Abstract

Background: With cardiovascular disease increasing, substantial research has focused on the development of prediction tools. We compare deep learning and machine learning models to a baseline logistic regression using only 'known' risk factors in predicting incident myocardial infarction (MI) from harmonized EHR data.

Methods: Large-scale case-control study with outcome of 6-month incident MI, conducted using the top 800, from an initial 52 k procedures, diagnoses, and medications within the UCHealth system, harmonized to the Observational Medical Outcomes Partnership common data model, performed on 2.27 million patients. We compared several over- and under- sampling techniques to address the imbalance in the dataset. We compared regularized logistics regression, random forest, boosted gradient machines, and shallow and deep neural networks. A baseline model for comparison was a logistic regression using a limited set of 'known' risk factors for MI. Hyper-parameters were identified using 10-fold cross-validation.

Results: Twenty thousand Five hundred and ninety-one patients were diagnosed with MI compared with 2.25 million who did not. A deep neural network with random undersampling provided superior classification compared with other methods. However, the benefit of the deep neural network was only moderate, showing an F1 Score of 0.092 and AUC of 0.835, compared to a logistic regression model using only 'known' risk factors. Calibration for all models was poor despite adequate discrimination, due to overfitting from low frequency of the event of interest.

Conclusions: Our study suggests that DNN may not offer substantial benefit when trained on harmonized data, compared to traditional methods using established risk factors for MI.

Keywords: Electronic health records; Machine learning; Myocardial infarction.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
a. Precision-recall curve for optimal model. b. ROC curve for optimal model. A) Confusion matrix for the optimally performing DNN B) ROC curve for DNN model
Fig. 2
Fig. 2
Calibration curve for optimal model. a. Calibration plot for the DNN, showing a wide discrepancy between actual observed distribution of outcomes vs. predicted probabilities from the model b. Distribution of distribution of predicted probabilities for cases vs. controls, showing good discrimination
Fig. 3
Fig. 3
Calibration curve for comparison model. a. Calibration plot for the logistic model using only known risk factors, showing a similar discrepancy between actual observed distribution of outcomes vs. predicted probabilities b. Distribution of distribution of predicted probabilities for cases vs. controls, again with good discrimination

References

    1. Global Burden of Cardiovascular Diseases Collaboration et al. The Burden of Cardiovascular Diseases Among US States, 1990–2016. JAMA Cardiol. 2018;3:375–389. doi: 10.1001/jamacardio.2018.0385. - DOI - PMC - PubMed
    1. Wang Y, et al. Risk factors associated with major cardiovascular events 1 year after acute myocardial infarction. JAMA Netw Open. 2018;1:e181079. doi: 10.1001/jamanetworkopen.2018.1079. - DOI - PMC - PubMed
    1. Yeh RW, Go AS. Rethinking the epidemiology of acute myocardial infarction: challenges and opportunities. Arch Intern Med. 2010;170:759–764. doi: 10.1001/archinternmed.2010.88. - DOI - PubMed
    1. Liu N, et al. Prediction of adverse cardiac events in emergency department patients with chest pain using machine learning for variable selection. BMC Med Inform Decis Mak. 2014;14:75. doi: 10.1186/1472-6947-14-75. - DOI - PMC - PubMed
    1. Steele AJ, Denaxas SC, Shah AD, Hemingway H, Luscombe NM. Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease. PLoS One. 2018;13:e0202344. doi: 10.1371/journal.pone.0202344. - DOI - PMC - PubMed

Publication types