Multi-label classification of symptom terms from free-text bilingual adverse drug reaction reports using natural language processing

doi:10.1371/journal.pone.0270595

. 2022 Aug 4;17(8):e0270595.

doi: 10.1371/journal.pone.0270595. eCollection 2022.

Multi-label classification of symptom terms from free-text bilingual adverse drug reaction reports using natural language processing

Sitthichok Chaichulee^{1

2}, Chissanupong Promchai³, Tanyamai Kaewkomon³, Chanon Kongkamol^{4

2}, Thammasin Ingviya^{4

2}, Pasuree Sangsupawanich⁵

Affiliations

¹ Department of Biomedical Sciences and Biomedical Engineering, Faculty of Medicine, Prince of Songkla University, Hatyai, Songkhla, Thailand.
² Division of Digital Innovation and Data Analytics, Faculty of Medicine, Prince of Songkla University, Hatyai, Songkhla, Thailand.
³ Department of Pharmacy, Songklanagarind Hospital, Faculty of Medicine, Prince of Songkla University, Hatyai, Songkhla, Thailand.
⁴ Department of Family and Preventive Medicine, Faculty of Medicine, Prince of Songkla University, Hatyai, Songkhla, Thailand.
⁵ Department of Pediatrics, Faculty of Medicine, Prince of Songkla University, Hatyai, Songkhla, Thailand.

PMID: 35925971
PMCID: PMC9352066
DOI: 10.1371/journal.pone.0270595

Multi-label classification of symptom terms from free-text bilingual adverse drug reaction reports using natural language processing

Sitthichok Chaichulee et al. PLoS One. 2022.

. 2022 Aug 4;17(8):e0270595.

doi: 10.1371/journal.pone.0270595. eCollection 2022.

Authors

Sitthichok Chaichulee^{1

2}, Chissanupong Promchai³, Tanyamai Kaewkomon³, Chanon Kongkamol^{4

2}, Thammasin Ingviya^{4

2}, Pasuree Sangsupawanich⁵

Affiliations

¹ Department of Biomedical Sciences and Biomedical Engineering, Faculty of Medicine, Prince of Songkla University, Hatyai, Songkhla, Thailand.
² Division of Digital Innovation and Data Analytics, Faculty of Medicine, Prince of Songkla University, Hatyai, Songkhla, Thailand.
³ Department of Pharmacy, Songklanagarind Hospital, Faculty of Medicine, Prince of Songkla University, Hatyai, Songkhla, Thailand.
⁴ Department of Family and Preventive Medicine, Faculty of Medicine, Prince of Songkla University, Hatyai, Songkhla, Thailand.
⁵ Department of Pediatrics, Faculty of Medicine, Prince of Songkla University, Hatyai, Songkhla, Thailand.

PMID: 35925971
PMCID: PMC9352066
DOI: 10.1371/journal.pone.0270595

Abstract

Allergic reactions to medication range from mild to severe or even life-threatening. Proper documentation of patient allergy information is critical for safe prescription, avoiding drug interactions, and reducing healthcare costs. Allergy information is regularly obtained during the medical interview, but is often poorly documented in electronic health records (EHRs). While many EHRs allow for structured adverse drug reaction (ADR) reporting, a free-text entry is still common. The resulting information is neither interoperable nor easily reusable for other applications, such as clinical decision support systems and prescription alerts. Current approaches require pharmacists to review and code ADRs documented by healthcare professionals. Recently, the effectiveness of machine algorithms in natural language processing (NLP) has been widely demonstrated. Our study aims to develop and evaluate different NLP algorithms that can encode unstructured ADRs stored in EHRs into institutional symptom terms. Our dataset consists of 79,712 pharmacist-reviewed drug allergy records. We evaluated three NLP techniques: Naive Bayes-Support Vector Machine (NB-SVM), Universal Language Model Fine-tuning (ULMFiT), and Bidirectional Encoder Representations from Transformers (BERT). We tested different general-domain pre-trained BERT models, including mBERT, XLM-RoBERTa, and WanchanBERTa, as well as our domain-specific AllergyRoBERTa, which was pre-trained from scratch on our corpus. Overall, BERT models had the highest performance. NB-SVM outperformed ULMFiT and BERT for several symptom terms that are not frequently coded. The ensemble model achieved an exact match ratio of 95.33%, a F1 score of 98.88%, and a mean average precision of 97.07% for the 36 most frequently coded symptom terms. The model was then further developed into a symptom term suggestion system and achieved a Krippendorff's alpha agreement coefficient of 0.7081 in prospective testing with pharmacists. Some degree of automation could both accelerate the availability of allergy information and reduce the efforts for human coding.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Flowchart outlines the steps involved in our study.**
Drug allergy records in the Songklanagarind Hospital’s EHR from October 2001 to July 2020 were extracted for developing algorithms. Only the records that had been reviewed by pharmacists were considered. We then trained NB-SVM, ULMFiT, and BERT-based models to map the unstructured allergy description to the institutional symptom terms. The ensemble model was then used for evaluation with pharmacists via a web application in a simulated EHR environment.

**Fig 2. GUI for document a new drug allergy record.**
GUI for documenting a drug allergy record in the Songklanagarind Hospital’s EHR. The user interface was originally in Thai and was translated to English for ease of the reader.

**Fig 3. Diagrams outline the steps involved in our methods.**
(A) Data preparation included word segmentation and tokenization, with each algorithm using different method. (B) NB-SVM involved training multiple pipelines for Naive Bayes feature extraction and SVM classification. (C) ULMFiT involved fine-tuning the pre-trained LM with our target-domain allergy corpus and fine-tuning a classifier for our multi-label classification task. (D) BERT involves fine-tuning a classifier for our multi-label classification task. This study evaluated three pre-trained general-domain BERT models and one target-domain BERT model pre-trained on our allergy corpus.

See this image and copyright information in PMC

Cited by

The Indonesian Young-Adult Attachment (IYAA): An audio-video dataset for behavioral young-adult attachment assessment.
Maghfira TN, Krisnadhi AA, Basaruddin T, Pudjiati SRR. Maghfira TN, et al. Data Brief. 2023 Sep 21;50:109599. doi: 10.1016/j.dib.2023.109599. eCollection 2023 Oct. Data Brief. 2023. PMID: 37780464 Free PMC article.
Application of the transformer model algorithm in chinese word sense disambiguation: a case study in chinese language.
Li L, Li J, Wang H, Nie J. Li L, et al. Sci Rep. 2024 Mar 15;14(1):6320. doi: 10.1038/s41598-024-56976-5. Sci Rep. 2024. PMID: 38491085 Free PMC article.
Optimizing classification of diseases through language model analysis of symptoms.
Hassan E, Abd El-Hafeez T, Shams MY. Hassan E, et al. Sci Rep. 2024 Jan 17;14(1):1507. doi: 10.1038/s41598-024-51615-5. Sci Rep. 2024. PMID: 38233458 Free PMC article.
Understanding patterns of loneliness in older long-term care users using natural language processing with free text case notes.
Rickman S, Fernandez JL, Malley J. Rickman S, et al. PLoS One. 2025 Apr 2;20(4):e0319745. doi: 10.1371/journal.pone.0319745. eCollection 2025. PLoS One. 2025. PMID: 40173389 Free PMC article.
Construction of a Multi-Label Classifier for Extracting Multiple Incident Factors From Medication Incident Reports in Residential Care Facilities: Natural Language Processing Approach.
Kizaki H, Satoh H, Ebara S, Watabe S, Sawada Y, Imai S, Hori S. Kizaki H, et al. JMIR Med Inform. 2024 Jul 23;12:e58141. doi: 10.2196/58141. JMIR Med Inform. 2024. PMID: 39042454 Free PMC article.

See all "Cited by" articles

References

1. Khan DA, Solensky R. Drug Allergy. Journal of Allergy and Clinical Immunology. 2010;125(2):S126–S137. doi: 10.1016/j.jaci.2009.10.028 - DOI - PubMed
1. Thong BYH, Tan TC. Epidemiology and Risk Factors for Drug Allergy. British Journal of Clinical Pharmacology. 2011. May;71(5):684–700. doi: 10.1111/j.1365-2125.2010.03774.x - DOI - PMC - PubMed
1. Warrington R, Silviu-Dan F. Drug Allergy. Allergy, Asthma & Clinical Immunology. 2011;7(S1):S10. doi: 10.1186/1710-1492-7-S1-S10 - DOI - PMC - PubMed
1. Greenberger PA. Drug Allergy. Allergy and Asthma Proceedings. 2019. Nov;40(6):474–479. doi: 10.2500/aap.2019.40.4275 - DOI - PubMed
1. Epstein RH, St Jacques P, Stockin M, Rothman B, Ehrenfeld JM, Denny JC. Automated Identification of Drug and Food Allergies Entered Using Non-standard Terminology. Journal of the American Medical Informatics Association. 2013;20(5):962–968. doi: 10.1136/amiajnl-2013-001756 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program

[1] Khan DA, Solensky R. Drug Allergy. Journal of Allergy and Clinical Immunology. 2010;125(2):S126–S137. doi: 10.1016/j.jaci.2009.10.028 - DOI - PubMed

[2] Khan DA, Solensky R. Drug Allergy. Journal of Allergy and Clinical Immunology. 2010;125(2):S126–S137. doi: 10.1016/j.jaci.2009.10.028 - DOI - PubMed

[3] Thong BYH, Tan TC. Epidemiology and Risk Factors for Drug Allergy. British Journal of Clinical Pharmacology. 2011. May;71(5):684–700. doi: 10.1111/j.1365-2125.2010.03774.x - DOI - PMC - PubMed

[4] Thong BYH, Tan TC. Epidemiology and Risk Factors for Drug Allergy. British Journal of Clinical Pharmacology. 2011. May;71(5):684–700. doi: 10.1111/j.1365-2125.2010.03774.x - DOI - PMC - PubMed

[5] Warrington R, Silviu-Dan F. Drug Allergy. Allergy, Asthma & Clinical Immunology. 2011;7(S1):S10. doi: 10.1186/1710-1492-7-S1-S10 - DOI - PMC - PubMed

[6] Warrington R, Silviu-Dan F. Drug Allergy. Allergy, Asthma & Clinical Immunology. 2011;7(S1):S10. doi: 10.1186/1710-1492-7-S1-S10 - DOI - PMC - PubMed

[7] Greenberger PA. Drug Allergy. Allergy and Asthma Proceedings. 2019. Nov;40(6):474–479. doi: 10.2500/aap.2019.40.4275 - DOI - PubMed

[8] Greenberger PA. Drug Allergy. Allergy and Asthma Proceedings. 2019. Nov;40(6):474–479. doi: 10.2500/aap.2019.40.4275 - DOI - PubMed

[9] Epstein RH, St Jacques P, Stockin M, Rothman B, Ehrenfeld JM, Denny JC. Automated Identification of Drug and Food Allergies Entered Using Non-standard Terminology. Journal of the American Medical Informatics Association. 2013;20(5):962–968. doi: 10.1136/amiajnl-2013-001756 - DOI - PMC - PubMed

[10] Epstein RH, St Jacques P, Stockin M, Rothman B, Ehrenfeld JM, Denny JC. Automated Identification of Drug and Food Allergies Entered Using Non-standard Terminology. Journal of the American Medical Informatics Association. 2013;20(5):962–968. doi: 10.1136/amiajnl-2013-001756 - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multi-label classification of symptom terms from free-text bilingual adverse drug reaction reports using natural language processing

Affiliations

Multi-label classification of symptom terms from free-text bilingual adverse drug reaction reports using natural language processing

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Research Materials

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Medical

Research Materials