Artificial Intelligence for Unstructured Healthcare Data: Application to Coding of Patient Reporting of Adverse Drug Reactions

Louis Létinier^{1

2

3}, Julien Jouganous³, Mehdi Benkebil⁴, Alicia Bel-Létoile³, Clément Goehrs³, Allison Singier¹, Franck Rouby^{5

6}, Clémence Lacroix^{5

6}, Ghada Miremont^{1

2}, Joëlle Micallef^{5

6}, Francesco Salvo^{1

2}, Antoine Pariente^{1

2}

Affiliations

¹ INSERM, BPH, U1219, Team Pharmacoepidemiology, Univ. Bordeaux, Bordeaux, France.
² CHU de Bordeaux, Pole de Santé Publique, Service de Pharmacologie Médicale, Centre de Pharmacovigilance de Bordeaux, Bordeaux, France.
³ Synapse Medicine, Bordeaux, France.
⁴ Surveillance Division, Agence nationale de sécurité du médicament et des produits de santé (ANSM), Saint Denis, France.
⁵ CRPV Marseille Provence Corse, Service Hospitalo-Universitaire de Pharmacologie Clinique et Pharmacovigilance, Assistance Publique Hôpitaux de Marseille, Marseille, France.
⁶ Institut des Neurosciences des Systèmes, INSERM 1106, Aix Marseille Université, Marseille, France.

PMID: 33866552
PMCID: PMC8359992
DOI: 10.1002/cpt.2266

Artificial Intelligence for Unstructured Healthcare Data: Application to Coding of Patient Reporting of Adverse Drug Reactions

Louis Létinier et al. Clin Pharmacol Ther. 2021 Aug.

. 2021 Aug;110(2):392-400.

doi: 10.1002/cpt.2266. Epub 2021 May 8.

Authors

Affiliations

¹ INSERM, BPH, U1219, Team Pharmacoepidemiology, Univ. Bordeaux, Bordeaux, France.
² CHU de Bordeaux, Pole de Santé Publique, Service de Pharmacologie Médicale, Centre de Pharmacovigilance de Bordeaux, Bordeaux, France.
³ Synapse Medicine, Bordeaux, France.
⁴ Surveillance Division, Agence nationale de sécurité du médicament et des produits de santé (ANSM), Saint Denis, France.
⁵ CRPV Marseille Provence Corse, Service Hospitalo-Universitaire de Pharmacologie Clinique et Pharmacovigilance, Assistance Publique Hôpitaux de Marseille, Marseille, France.
⁶ Institut des Neurosciences des Systèmes, INSERM 1106, Aix Marseille Université, Marseille, France.

PMID: 33866552
PMCID: PMC8359992
DOI: 10.1002/cpt.2266

Abstract

Adverse drug reaction (ADR) reporting is a major component of drug safety monitoring; its input will, however, only be optimized if systems can manage to deal with its tremendous flow of information, based primarily on unstructured text fields. The aim of this study was to develop an automated system allowing to code ADRs from patient reports. Our system was based on a knowledge base about drugs, enriched by supervised machine learning (ML) models trained on patients reporting data. To train our models, we selected all cases of ADRs reported by patients to a French Pharmacovigilance Centre through a national web-portal between March 2017 and March 2019 (n = 2,058 reports). We tested both conventional ML models and deep-learning models. We performed an external validation using a dataset constituted of a random sample of ADRs reported to the Marseille Pharmacovigilance Centre over the same period (n = 187). Here, we show that regarding area under the curve (AUC) and F-measure, the best model to identify ADRs was gradient boosting trees (LGBM), with an AUC of 0.93 (0.92-0.94) and F-measure of 0.72 (0.68-0.75). This model was run for external validation showing an AUC of 0.91 and a F-measure of 0.58. We evaluated an artificial intelligence pipeline that was found able to learn how to identify correctly ADRs from unstructured data. This result allowed us to start a new study using more data to further improve our performance and offer a tool that is useful in practice to efficiently manage drug safety information.

PubMed Disclaimer

Conflict of interest statement

L.L., J.J., A.B., and C.G. were employed by Synapse Medicine at the time this research was conducted or hold stock/stock options therein. All other authors declared no competing interests.

Figures

**Figure 1**
Artificial intelligence pipeline to identify and code an adverse drug reactions (ADRs) from free text using MedDRA terminology. (a) Patient sends an ADR report form with clinical information. (b) Text cleaning and extraction of relevant information. (c) Matching MedDRA terms into the case reports using knowledge graph. (d) Data formatting for our machine learning (ML) models. Conventional ML: logit, random forest (RF), support‐vector machine (SVM) and light gradient boosting machine (LightGBM or LGBM). Neural networks and deep learning models: FastText, long short‐term memory recurrent neural network (LSTM) and convolutional neural network (CNN). (e) Training ML models on 90% of our dataset (train set) and then computing evaluation metrics on the remaining 10% (test set). (f) Selection of the best ML model regarding area under the curve (AUC) and F‐measure. [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 2**
Performances of different machine learning models for the identification of adverse drug reaction from patient reports. (a) Performances of machine learning (ML) models in terms of receiver operating characteristic (ROC) curve and area under the curve (AUC) on the intern validation set. (b) Performances of ML models in terms of F‐measure (F1) on the internal validation set). CNN, convolutional neural network; SVM, support‐vector machine. [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 3**
Performances of different machine learning models for the determination of adverse drug reaction seriousness from patient reports. (a) Performances of machine learning (ML) models in terms of receiver operating characteristic (ROC) curve and area under the curve (AUC) on the intern validation set. (b) Performances of ML models in terms of F‐measure (F1) on the internal validation set). SVM, support‐vector machine. [Colour figure can be viewed at wileyonlinelibrary.com]

See this image and copyright information in PMC

References

1. Sheikh, A., Jha, A., Cresswell, K., Greaves, F. & Bates, D.W. Adoption of electronic health records in UK hospitals: lessons from the USA. Lancet 384, 8–9 (2014). - PubMed
1. He, Z., Tao, C., Bian, J., Dumontier, M. & Hogan, W.R Semantics‐powered healthcare engineering and data analytics. J. Healthc. Eng. 2017, 1–3 (2017). 10.1155/2017/7983473 - DOI - PMC - PubMed
1. Vayena, E., Dzenowagis, J., Brownstein, J.S. & Sheikh, A. Policy implications of big data in the health sector. Bull. World Health Organ. 96, 66–68 (2018). - PMC - PubMed
1. Kish, L.J. & Topol, E.J. Unpatients—why patients should own their medical data. Nat. Biotechnol. 33, 921–924 (2015). - PubMed
1. Cambridge dictionary <https://dictionary.cambridge.org/fr/dictionnaire/anglais/artificial‐inte...>. Accessed April 9, 2021.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Artificial Intelligence for Unstructured Healthcare Data: Application to Coding of Patient Reporting of Adverse Drug Reactions

Affiliations

Artificial Intelligence for Unstructured Healthcare Data: Application to Coding of Patient Reporting of Adverse Drug Reactions

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Research Materials