Electronic phenotyping of health outcomes of interest using a linked claims-electronic health record database: Findings from a machine learning pilot project
- PMID: 33712852
- PMCID: PMC8279790
- DOI: 10.1093/jamia/ocab036
Electronic phenotyping of health outcomes of interest using a linked claims-electronic health record database: Findings from a machine learning pilot project
Abstract
Objective: Claims-based algorithms are used in the Food and Drug Administration Sentinel Active Risk Identification and Analysis System to identify occurrences of health outcomes of interest (HOIs) for medical product safety assessment. This project aimed to apply machine learning classification techniques to demonstrate the feasibility of developing a claims-based algorithm to predict an HOI in structured electronic health record (EHR) data.
Materials and methods: We used the 2015-2019 IBM MarketScan Explorys Claims-EMR Data Set, linking administrative claims and EHR data at the patient level. We focused on a single HOI, rhabdomyolysis, defined by EHR laboratory test results. Using claims-based predictors, we applied machine learning techniques to predict the HOI: logistic regression, LASSO (least absolute shrinkage and selection operator), random forests, support vector machines, artificial neural nets, and an ensemble method (Super Learner).
Results: The study cohort included 32 956 patients and 39 499 encounters. Model performance (positive predictive value [PPV], sensitivity, specificity, area under the receiver-operating characteristic curve) varied considerably across techniques. The area under the receiver-operating characteristic curve exceeded 0.80 in most model variations.
Discussion: For the main Food and Drug Administration use case of assessing risk of rhabdomyolysis after drug use, a model with a high PPV is typically preferred. The Super Learner ensemble model without adjustment for class imbalance achieved a PPV of 75.6%, substantially better than a previously used human expert-developed model (PPV = 44.0%).
Conclusions: It is feasible to use machine learning methods to predict an EHR-derived HOI with claims-based predictors. Modeling strategies can be adapted for intended uses, including surveillance, identification of cases for chart review, and outcomes research.
Keywords: administrative claims; electronic health records; electronic phenotyping; healthcare; rhabdomyolysis; supervised machine learning.
© The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Figures



Similar articles
-
Applying machine learning approaches for predicting obesity risk using US health administrative claims database.BMJ Open Diabetes Res Care. 2024 Sep 26;12(5):e004193. doi: 10.1136/bmjdrc-2024-004193. BMJ Open Diabetes Res Care. 2024. PMID: 39327067 Free PMC article.
-
Comparison of Machine Learning Methods With Traditional Models for Use of Administrative Claims With Electronic Medical Records to Predict Heart Failure Outcomes.JAMA Netw Open. 2020 Jan 3;3(1):e1918962. doi: 10.1001/jamanetworkopen.2019.18962. JAMA Netw Open. 2020. PMID: 31922560 Free PMC article.
-
Claims-Based Algorithms for Identifying Patients With Pulmonary Hypertension: A Comparison of Decision Rules and Machine-Learning Approaches.J Am Heart Assoc. 2020 Oct 20;9(19):e016648. doi: 10.1161/JAHA.120.016648. Epub 2020 Sep 29. J Am Heart Assoc. 2020. PMID: 32990147 Free PMC article.
-
Systematic review of approaches to use of neighborhood-level risk factors with clinical data to predict clinical risk and recommend interventions.J Biomed Inform. 2021 Apr;116:103713. doi: 10.1016/j.jbi.2021.103713. Epub 2021 Feb 18. J Biomed Inform. 2021. PMID: 33610880
-
Use of electronic health data to identify patients with moderate-to-severe osteoarthritis of the hip and/or knee and inadequate response to pain medications.BMC Med Res Methodol. 2023 Jun 30;23(1):156. doi: 10.1186/s12874-023-01964-y. BMC Med Res Methodol. 2023. PMID: 37391751 Free PMC article. Review.
Cited by
-
Scalable and interpretable alternative to chart review for phenotype evaluation using standardized structured data from electronic health records.J Am Med Inform Assoc. 2023 Dec 22;31(1):119-129. doi: 10.1093/jamia/ocad202. J Am Med Inform Assoc. 2023. PMID: 37847668 Free PMC article.
-
Classifying Infection Risk Following Pediatric Cardiac Surgery.AMIA Annu Symp Proc. 2023 Apr 29;2022:1153-1162. eCollection 2022. AMIA Annu Symp Proc. 2023. PMID: 37128399 Free PMC article.
-
Applying AI to Structured Real-World Data for Pharmacovigilance Purposes: Scoping Review.J Med Internet Res. 2024 Dec 30;26:e57824. doi: 10.2196/57824. J Med Internet Res. 2024. PMID: 39753222 Free PMC article.
-
Machine learning approaches for electronic health records phenotyping: a methodical review.J Am Med Inform Assoc. 2023 Jan 18;30(2):367-381. doi: 10.1093/jamia/ocac216. J Am Med Inform Assoc. 2023. PMID: 36413056 Free PMC article.
-
Pharmacovigilance and Pharmacoepidemiology as a Guarantee of Patient Safety: The Role of the Clinical Pharmacologist.J Clin Med. 2022 Jun 20;11(12):3552. doi: 10.3390/jcm11123552. J Clin Med. 2022. PMID: 35743619 Free PMC article.
References
-
- Platt R, Brown JS, Robb M, et al.The FDA sentinel initiative — an evolving national resource. N Engl J Med 2018; 379: 2091–3. - PubMed
-
- Food and Drug Administration. Sentinel Initiative. https://www.sentinelinitiative.org. Accessed June 15, 2020.
-
- James G, Witten D, Hastie T, et al.Introduction to Statistical Learning with Applications in R. New York, NY: Springer; 2013.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources