Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug;110(2):392-400.
doi: 10.1002/cpt.2266. Epub 2021 May 8.

Artificial Intelligence for Unstructured Healthcare Data: Application to Coding of Patient Reporting of Adverse Drug Reactions

Affiliations

Artificial Intelligence for Unstructured Healthcare Data: Application to Coding of Patient Reporting of Adverse Drug Reactions

Louis Létinier et al. Clin Pharmacol Ther. 2021 Aug.

Abstract

Adverse drug reaction (ADR) reporting is a major component of drug safety monitoring; its input will, however, only be optimized if systems can manage to deal with its tremendous flow of information, based primarily on unstructured text fields. The aim of this study was to develop an automated system allowing to code ADRs from patient reports. Our system was based on a knowledge base about drugs, enriched by supervised machine learning (ML) models trained on patients reporting data. To train our models, we selected all cases of ADRs reported by patients to a French Pharmacovigilance Centre through a national web-portal between March 2017 and March 2019 (n = 2,058 reports). We tested both conventional ML models and deep-learning models. We performed an external validation using a dataset constituted of a random sample of ADRs reported to the Marseille Pharmacovigilance Centre over the same period (n = 187). Here, we show that regarding area under the curve (AUC) and F-measure, the best model to identify ADRs was gradient boosting trees (LGBM), with an AUC of 0.93 (0.92-0.94) and F-measure of 0.72 (0.68-0.75). This model was run for external validation showing an AUC of 0.91 and a F-measure of 0.58. We evaluated an artificial intelligence pipeline that was found able to learn how to identify correctly ADRs from unstructured data. This result allowed us to start a new study using more data to further improve our performance and offer a tool that is useful in practice to efficiently manage drug safety information.

PubMed Disclaimer

Conflict of interest statement

L.L., J.J., A.B., and C.G. were employed by Synapse Medicine at the time this research was conducted or hold stock/stock options therein. All other authors declared no competing interests.

Figures

Figure 1
Figure 1
Artificial intelligence pipeline to identify and code an adverse drug reactions (ADRs) from free text using MedDRA terminology. (a) Patient sends an ADR report form with clinical information. (b) Text cleaning and extraction of relevant information. (c) Matching MedDRA terms into the case reports using knowledge graph. (d) Data formatting for our machine learning (ML) models. Conventional ML: logit, random forest (RF), support‐vector machine (SVM) and light gradient boosting machine (LightGBM or LGBM). Neural networks and deep learning models: FastText, long short‐term memory recurrent neural network (LSTM) and convolutional neural network (CNN). (e) Training ML models on 90% of our dataset (train set) and then computing evaluation metrics on the remaining 10% (test set). (f) Selection of the best ML model regarding area under the curve (AUC) and F‐measure. [Colour figure can be viewed at wileyonlinelibrary.com]
Figure 2
Figure 2
Performances of different machine learning models for the identification of adverse drug reaction from patient reports. (a) Performances of machine learning (ML) models in terms of receiver operating characteristic (ROC) curve and area under the curve (AUC) on the intern validation set. (b) Performances of ML models in terms of F‐measure (F1) on the internal validation set). CNN, convolutional neural network; SVM, support‐vector machine. [Colour figure can be viewed at wileyonlinelibrary.com]
Figure 3
Figure 3
Performances of different machine learning models for the determination of adverse drug reaction seriousness from patient reports. (a) Performances of machine learning (ML) models in terms of receiver operating characteristic (ROC) curve and area under the curve (AUC) on the intern validation set. (b) Performances of ML models in terms of F‐measure (F1) on the internal validation set). SVM, support‐vector machine. [Colour figure can be viewed at wileyonlinelibrary.com]

References

    1. Sheikh, A., Jha, A., Cresswell, K., Greaves, F. & Bates, D.W. Adoption of electronic health records in UK hospitals: lessons from the USA. Lancet 384, 8–9 (2014). - PubMed
    1. He, Z., Tao, C., Bian, J., Dumontier, M. & Hogan, W.R Semantics‐powered healthcare engineering and data analytics. J. Healthc. Eng. 2017, 1–3 (2017). 10.1155/2017/7983473 - DOI - PMC - PubMed
    1. Vayena, E., Dzenowagis, J., Brownstein, J.S. & Sheikh, A. Policy implications of big data in the health sector. Bull. World Health Organ. 96, 66–68 (2018). - PMC - PubMed
    1. Kish, L.J. & Topol, E.J. Unpatients—why patients should own their medical data. Nat. Biotechnol. 33, 921–924 (2015). - PubMed
    1. Cambridge dictionary <https://dictionary.cambridge.org/fr/dictionnaire/anglais/artificial‐inte...>. Accessed April 9, 2021.

MeSH terms

LinkOut - more resources