. 2021 Jul 14;28(7):1507-1517.

doi: 10.1093/jamia/ocab036.

Electronic phenotyping of health outcomes of interest using a linked claims-electronic health record database: Findings from a machine learning pilot project

Affiliations

¹ Government Health and Human Services, IBM Watson Health, Bethesda, Maryland, USA.
² Food and Drug Administration, Silver Spring, Maryland, USA.
³ Harvard Medical School and Harvard Pilgrim Health Care Institute, Harvard Medical School, Boston, Massachusetts, USA.
⁴ Office of Biostatistics, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland, USA.
⁵ Division of Epidemiology II, Office of Pharmacovigilance and Epidemiology, Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland, USA.
⁶ Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland, USA.
⁷ Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, Massachusetts, USA.
⁸ Office of Surveillance and Epidemiology Center for Drug Evaluation and Research U.S. Food and Drug Administration, Silver Spring, Maryland, USA.

PMID: 33712852
PMCID: PMC8279790
DOI: 10.1093/jamia/ocab036

Electronic phenotyping of health outcomes of interest using a linked claims-electronic health record database: Findings from a machine learning pilot project

Teresa B Gibson et al. J Am Med Inform Assoc. 2021.

. 2021 Jul 14;28(7):1507-1517.

doi: 10.1093/jamia/ocab036.

Authors

Affiliations

¹ Government Health and Human Services, IBM Watson Health, Bethesda, Maryland, USA.
² Food and Drug Administration, Silver Spring, Maryland, USA.
³ Harvard Medical School and Harvard Pilgrim Health Care Institute, Harvard Medical School, Boston, Massachusetts, USA.
⁴ Office of Biostatistics, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland, USA.
⁵ Division of Epidemiology II, Office of Pharmacovigilance and Epidemiology, Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland, USA.
⁶ Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland, USA.
⁷ Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, Massachusetts, USA.
⁸ Office of Surveillance and Epidemiology Center for Drug Evaluation and Research U.S. Food and Drug Administration, Silver Spring, Maryland, USA.

PMID: 33712852
PMCID: PMC8279790
DOI: 10.1093/jamia/ocab036

Abstract

Objective: Claims-based algorithms are used in the Food and Drug Administration Sentinel Active Risk Identification and Analysis System to identify occurrences of health outcomes of interest (HOIs) for medical product safety assessment. This project aimed to apply machine learning classification techniques to demonstrate the feasibility of developing a claims-based algorithm to predict an HOI in structured electronic health record (EHR) data.

Materials and methods: We used the 2015-2019 IBM MarketScan Explorys Claims-EMR Data Set, linking administrative claims and EHR data at the patient level. We focused on a single HOI, rhabdomyolysis, defined by EHR laboratory test results. Using claims-based predictors, we applied machine learning techniques to predict the HOI: logistic regression, LASSO (least absolute shrinkage and selection operator), random forests, support vector machines, artificial neural nets, and an ensemble method (Super Learner).

Results: The study cohort included 32 956 patients and 39 499 encounters. Model performance (positive predictive value [PPV], sensitivity, specificity, area under the receiver-operating characteristic curve) varied considerably across techniques. The area under the receiver-operating characteristic curve exceeded 0.80 in most model variations.

Discussion: For the main Food and Drug Administration use case of assessing risk of rhabdomyolysis after drug use, a model with a high PPV is typically preferred. The Super Learner ensemble model without adjustment for class imbalance achieved a PPV of 75.6%, substantially better than a previously used human expert-developed model (PPV = 44.0%).

Conclusions: It is feasible to use machine learning methods to predict an EHR-derived HOI with claims-based predictors. Modeling strategies can be adapted for intended uses, including surveillance, identification of cases for chart review, and outcomes research.

Keywords: administrative claims; electronic health records; electronic phenotyping; healthcare; rhabdomyolysis; supervised machine learning.

PubMed Disclaimer

Figures

**Figure 1.**
Episode construction. CK: creatine kinase; EHR: electronic health record; ULN: upper limit of normal.

**Figure 2.**
Health outcome of interest scenario 1: summary results across model specifications. LASSO: least absolute shrinkage and selection operator; PPV: positive predictive value.

**Figure 3.**
Health outcome of interest scenario 2: summary results across model specifications. LASSO: least absolute shrinkage and selection operator; PPV: positive predictive value.

See this image and copyright information in PMC

References

1. Platt R, Brown JS, Robb M, et al. The FDA sentinel initiative — an evolving national resource. N Engl J Med 2018; 379: 2091–3. - PubMed
1. Food and Drug Administration. Sentinel Initiative. https://www.sentinelinitiative.org. Accessed June 15, 2020.
1. James G, Witten D, Hastie T, et al. Introduction to Statistical Learning with Applications in R. New York, NY: Springer; 2013.
1. Parikh RB, Manz C, Chivers C, et al. Machine learning approaches to predict 6-month mortality among patients with cancer. JAMA Netw Open 2019; 2(10): e1915997. - PMC - PubMed
1. Ostropolets A, Reich C, Ryan P, et al. Adapting electronic health records-derived phenotypes to claims data: Lessons learned in using limited clinical data for phenotyping. J Biomed Inform 2020; 102: 103363. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

HHSF223201400030I/FD/FDA HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Electronic phenotyping of health outcomes of interest using a linked claims-electronic health record database: Findings from a machine learning pilot project

Affiliations

Electronic phenotyping of health outcomes of interest using a linked claims-electronic health record database: Findings from a machine learning pilot project

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources