Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May:5:541-549.
doi: 10.1200/CCI.20.00109.

Automated Identification of Patients With Immune-Related Adverse Events From Clinical Notes Using Word Embedding and Machine Learning

Affiliations

Automated Identification of Patients With Immune-Related Adverse Events From Clinical Notes Using Word Embedding and Machine Learning

Samir Gupta et al. JCO Clin Cancer Inform. 2021 May.

Abstract

Purpose: Although immune checkpoint inhibitors (ICIs) have substantially improved survival in patients with advanced malignancies, they are associated with a unique spectrum of side effects termed immune-related adverse events (irAEs). To ensure treatment safety, research efforts are needed to comprehensively detect and understand irAEs. Retrospective analysis of data from electronic health records can provide knowledge to characterize these toxicities. However, such information is not captured in a structured format within the electronic health record and requires manual chart review.

Materials and methods: In this work, we propose a natural language processing pipeline that can automatically annotate clinical notes and determine whether there is evidence that a patient developed an irAE. Seven hundred eighty-one cases were manually reviewed by clinicians and annotated for irAEs at the patient level. A dictionary of irAEs keywords was used to perform text reduction on clinical notes belonging to each patient; only sentences with relevant expressions were kept. Word embeddings were then used to generate vector representations over the reduced text, which served as input for the machine learning classifiers. The output of the models was presence or absence of any irAEs. Additional models were built to classify skin-related toxicities, endocrine toxicities, and colitis.

Results: The model for any irAE achieved an average F1-score = 0.75 and area under the receiver operating characteristic curve = 0.85. This outperformed a basic keyword filtering approach. Although the classifier of any irAEs achieved good accuracy, individual irAE classification still has room for improvement.

Conclusion: We demonstrate that patient-level annotations combined with a machine learning approach using keywords filtering and word embeddings can achieve promising accuracy in classifying irAEs in clinical notes. This model may facilitate annotation and analysis of large irAEs data sets.

PubMed Disclaimer

Conflict of interest statement

Neil J. ShahConsulting or Advisory Role: Merck Sharp & Dohme Corp Michael B. AtkinsStock and Other Ownership Interests: Werewolf Pharma, PyxisConsulting or Advisory Role: Genentech, Novartis, Bristol Myers Squibb, Merck, Exelixis, Eisai, Agenus, Arrowhead Pharmaceuticals, Werewolf Pharma, Surface Oncology, Iovance Biotherapeutics, Immunocore, Pyxis, Pneuma Respiratory, Leads Biolabs, Fathom Biotechnology, Aveo, Cota Healthcare, Neoleukin Therapeutics, Adagene, Idera, Ellipses Pharma, AstraZeneca, PACT Pharma, Third Rock Ventures, Seattle Genetics, Pfizer, ScholarRockResearch Funding: Bristol Myers Squibb Subha MadhavanLeadership: PertheraStock and Other Ownership Interests: PertheraConsulting or Advisory Role: PertheraResearch Funding: Teewinot Life Sciences, Leidos BiosciencesNo other potential conflicts of interest were reported.

Figures

FIG 1.
FIG 1.
System workflow. EHR, electronic health record; irAE, immune-related adverse event; ML, machine learning; tf-idf, term frequency-inverse document frequency.
FIG 2.
FIG 2.
Note filtering example. We selected all clinical notes between the date of the first dose of the ICI and the last date of the follow-up or date of start of another ICI. Sentences highlighted in red and blue contain irAE keywords. Sentences highlighted in blue, which indicate a negative finding, were removed using heursitics. ICI, immune checkpoint inhibitor; irAE, immune-related adverse event; IV, intravenous.
FIG 3.
FIG 3.
Scatter text analysis. copd, chronic obstructive pulmonary disease; irAE, immune-related adverse event; tfts, thyroid function tests.

References

    1. Pardoll DM.The blockade of immune checkpoints in cancer immunotherapy Nat Rev Cancer 12252–2642012 - PMC - PubMed
    1. Postow MA, Sidlow R, Hellmann MD.Immune-related adverse events associated with immune checkpoint blockade N Engl J Med 378158–1682018 - PubMed
    1. Myers G.Immune-related adverse events of immune checkpoint inhibitors: A brief review Curr Oncol 25342–3472018 - PMC - PubMed
    1. Sosa A, Lopez Cadena E, Simon Olive C, et al. Clinical assessment of immune-related adverse events. Ther Adv Med Oncol. 2018;10:1758835918764628. - PMC - PubMed
    1. Tarhini A. Immune-mediated adverse events associated with ipilimumab CTLA-4 blockade therapy: The underlying mechanisms and clinical management. Scientifica (Cairo) 2013;2013:857519. - PMC - PubMed

Publication types