Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 1;192(2):283-295.
doi: 10.1093/aje/kwac182.

Improving Methods of Identifying Anaphylaxis for Medical Product Safety Surveillance Using Natural Language Processing and Machine Learning

Improving Methods of Identifying Anaphylaxis for Medical Product Safety Surveillance Using Natural Language Processing and Machine Learning

David S Carrell et al. Am J Epidemiol. .

Abstract

We sought to determine whether machine learning and natural language processing (NLP) applied to electronic medical records could improve performance of automated health-care claims-based algorithms to identify anaphylaxis events using data on 516 patients with outpatient, emergency department, or inpatient anaphylaxis diagnosis codes during 2015-2019 in 2 integrated health-care institutions in the Northwest United States. We used one site's manually reviewed gold-standard outcomes data for model development and the other's for external validation based on cross-validated area under the receiver operating characteristic curve (AUC), positive predictive value (PPV), and sensitivity. In the development site 154 (64%) of 239 potential events met adjudication criteria for anaphylaxis compared with 180 (65%) of 277 in the validation site. Logistic regression models using only structured claims data achieved a cross-validated AUC of 0.58 (95% CI: 0.54, 0.63). Machine learning improved cross-validated AUC to 0.62 (0.58, 0.66); incorporating NLP-derived covariates further increased cross-validated AUCs to 0.70 (0.66, 0.75) in development and 0.67 (0.63, 0.71) in external validation data. A classification threshold with cross-validated PPV of 79% and cross-validated sensitivity of 66% in development data had cross-validated PPV of 78% and cross-validated sensitivity of 56% in external data. Machine learning and NLP-derived data improved identification of validated anaphylaxis events.

Keywords: anaphylaxis; electronic health records; health outcome identification; machine learning, supervised; postmarketing product surveillance; predictive modeling.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Weighted cross-validated area under the receiver operating characteristic curve for Kaiser Permanente Washington algorithms identifying actual anaphylaxis events in Kaiser Permanente Washington data (2015–2019) using the best machine-learning approach applied to structured and all natural language processing (NLP) data, traditional logistic regression approach applied to structured and all NLP data, machine-learning approach applied to structured data only, and traditional logistic regression approach applied to structured data only.
Figure 2
Figure 2
Cross-validated area under the receiver operating characteristic curve for the most generalizable high-performing machine-learning model developed at Kaiser Permanente Washington (KPWA, 2015–2019) using structured and natural language processing–derived data (Bayesian additive regression trees (BART) model 2 with Retain-All variable selection) evaluated in KPWA data and, separately, in Kaiser Permanente Northwest (KPNW) data (2015–2019).
Figure 3
Figure 3
Model performance for classifying actual anaphylaxis events in Kaiser Permanente Washington (KPWA) data and Kaiser Permanente Northwest (KPNW) data, 2015–2019 . The model, developed in KPWA data, was a Bayesian additive regression trees (BART) model 2 retaining all covariates. Plotted values are cross-validated (CV) positive predictive value and sensitivity for increments of the predicted risk between the 10th and 95th percentiles.

Similar articles

Cited by

References

    1. Yu JE, Lin RY. The epidemiology of anaphylaxis. Clin Rev Allergy Immunol. 2018;54(3):366–374. - PubMed
    1. Lieberman P, Camargo CA Jr, Bohlke K, et al. . Epidemiology of anaphylaxis: findings of the American College of Allergy, Asthma and Immunology Epidemiology of Anaphylaxis Working Group. Ann Allergy Asthma Immunol. 2006;97(5):596–602. - PubMed
    1. Rudders SA, Arias SA, Camargo CA Jr. Trends in hospitalizations for food-induced anaphylaxis in US children, 2000–2009. J Allergy Clin Immunol. 2014;134(4):960–2 e3. - PubMed
    1. Shrestha P, Dhital R, Poudel D, et al. . Trends in hospitalizations related to anaphylaxis, angioedema, and urticaria in the United States. Ann Allergy Asthma Immunol. 2019;122(4):401–406.e2. - PubMed
    1. Mulla ZD, Lin RY, Simon MR. Perspectives on anaphylaxis epidemiology in the United States with new data and analyses. Curr Allergy Asthma Rep. 2011;11(1):37–44. - PMC - PubMed

Publication types