Improving Methods of Identifying Anaphylaxis for Medical Product Safety Surveillance Using Natural Language Processing and Machine Learning

David S Carrell, Susan Gruber, James S Floyd, Maralyssa A Bann, Kara L Cushing-Haugen, Ron L Johnson, Vina Graham, David J Cronkite, Brian L Hazlehurst, Andrew H Felcher, Cosmin A Bejan, Adee Kennedy, Mayura U Shinde, Sara Karami, Yong Ma, Danijela Stojanovic, Yueqin Zhao, Robert Ball, Jennifer C Nelson

PMID: 36331289
PMCID: PMC9896464
DOI: 10.1093/aje/kwac182

Improving Methods of Identifying Anaphylaxis for Medical Product Safety Surveillance Using Natural Language Processing and Machine Learning

David S Carrell et al. Am J Epidemiol. 2023.

. 2023 Feb 1;192(2):283-295.

doi: 10.1093/aje/kwac182.

Authors

PMID: 36331289
PMCID: PMC9896464
DOI: 10.1093/aje/kwac182

Abstract

We sought to determine whether machine learning and natural language processing (NLP) applied to electronic medical records could improve performance of automated health-care claims-based algorithms to identify anaphylaxis events using data on 516 patients with outpatient, emergency department, or inpatient anaphylaxis diagnosis codes during 2015-2019 in 2 integrated health-care institutions in the Northwest United States. We used one site's manually reviewed gold-standard outcomes data for model development and the other's for external validation based on cross-validated area under the receiver operating characteristic curve (AUC), positive predictive value (PPV), and sensitivity. In the development site 154 (64%) of 239 potential events met adjudication criteria for anaphylaxis compared with 180 (65%) of 277 in the validation site. Logistic regression models using only structured claims data achieved a cross-validated AUC of 0.58 (95% CI: 0.54, 0.63). Machine learning improved cross-validated AUC to 0.62 (0.58, 0.66); incorporating NLP-derived covariates further increased cross-validated AUCs to 0.70 (0.66, 0.75) in development and 0.67 (0.63, 0.71) in external validation data. A classification threshold with cross-validated PPV of 79% and cross-validated sensitivity of 66% in development data had cross-validated PPV of 78% and cross-validated sensitivity of 56% in external data. Machine learning and NLP-derived data improved identification of validated anaphylaxis events.

Keywords: anaphylaxis; electronic health records; health outcome identification; machine learning, supervised; postmarketing product surveillance; predictive modeling.

PubMed Disclaimer

Figures

**Figure 1**
Weighted cross-validated area under the receiver operating characteristic curve for Kaiser Permanente Washington algorithms identifying actual anaphylaxis events in Kaiser Permanente Washington data (2015–2019) using the best machine-learning approach applied to structured and all natural language processing (NLP) data, traditional logistic regression approach applied to structured and all NLP data, machine-learning approach applied to structured data only, and traditional logistic regression approach applied to structured data only.

**Figure 2**
Cross-validated area under the receiver operating characteristic curve for the most generalizable high-performing machine-learning model developed at Kaiser Permanente Washington (KPWA, 2015–2019) using structured and natural language processing–derived data (Bayesian additive regression trees (BART) model 2 with Retain-All variable selection) evaluated in KPWA data and, separately, in Kaiser Permanente Northwest (KPNW) data (2015–2019).

**Figure 3**
Model performance for classifying actual anaphylaxis events in Kaiser Permanente Washington (KPWA) data and Kaiser Permanente Northwest (KPNW) data, 2015–2019 . The model, developed in KPWA data, was a Bayesian additive regression trees (BART) model 2 retaining all covariates. Plotted values are cross-validated (CV) positive predictive value and sensitivity for increments of the predicted risk between the 10th and 95th percentiles.

See this image and copyright information in PMC

References

1. Yu JE, Lin RY. The epidemiology of anaphylaxis. Clin Rev Allergy Immunol. 2018;54(3):366–374. - PubMed
1. Lieberman P, Camargo CA Jr, Bohlke K, et al. Epidemiology of anaphylaxis: findings of the American College of Allergy, Asthma and Immunology Epidemiology of Anaphylaxis Working Group. Ann Allergy Asthma Immunol. 2006;97(5):596–602. - PubMed
1. Rudders SA, Arias SA, Camargo CA Jr. Trends in hospitalizations for food-induced anaphylaxis in US children, 2000–2009. J Allergy Clin Immunol. 2014;134(4):960–2 e3. - PubMed
1. Shrestha P, Dhital R, Poudel D, et al. Trends in hospitalizations related to anaphylaxis, angioedema, and urticaria in the United States. Ann Allergy Asthma Immunol. 2019;122(4):401–406.e2. - PubMed
1. Mulla ZD, Lin RY, Simon MR. Perspectives on anaphylaxis epidemiology in the United States with new data and analyses. Curr Allergy Asthma Rep. 2011;11(1):37–44. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Improving Methods of Identifying Anaphylaxis for Medical Product Safety Surveillance Using Natural Language Processing and Machine Learning

Improving Methods of Identifying Anaphylaxis for Medical Product Safety Surveillance Using Natural Language Processing and Machine Learning

Authors

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous