Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features

Azadeh Nikfarjam¹, Abeed Sarker², Karen O'Connor², Rachel Ginn², Graciela Gonzalez¹

Affiliations

¹ Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, USA anikfarj@asu.edu graciela.gonzalez@asu.edu.
² Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, USA.

PMID: 25755127
PMCID: PMC4457113
DOI: 10.1093/jamia/ocu041

Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features

Azadeh Nikfarjam et al. J Am Med Inform Assoc. 2015 May.

. 2015 May;22(3):671-81.

doi: 10.1093/jamia/ocu041. Epub 2015 Mar 9.

Authors

Azadeh Nikfarjam¹, Abeed Sarker², Karen O'Connor², Rachel Ginn², Graciela Gonzalez¹

Affiliations

¹ Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, USA anikfarj@asu.edu graciela.gonzalez@asu.edu.
² Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, USA.

PMID: 25755127
PMCID: PMC4457113
DOI: 10.1093/jamia/ocu041

Abstract

Objective: Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks, particularly for pharmacovigilance, via the use of natural language processing (NLP) techniques. However, the language in social media is highly informal, and user-expressed medical concepts are often nontechnical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and thus far, advanced machine learning-based NLP techniques have been underutilized. Our objective is to design a machine learning-based approach to extract mentions of adverse drug reactions (ADRs) from highly informal text in social media.

Methods: We introduce ADRMine, a machine learning-based concept extraction system that uses conditional random fields (CRFs). ADRMine utilizes a variety of features, including a novel feature for modeling words' semantic similarities. The similarities are modeled by clustering words based on unsupervised, pretrained word representation vectors (embeddings) generated from unlabeled user posts in social media using a deep learning technique.

Results: ADRMine outperforms several strong baseline systems in the ADR extraction task by achieving an F-measure of 0.82. Feature analysis demonstrates that the proposed word cluster features significantly improve extraction performance.

Conclusion: It is possible to extract complex medical concepts, with relatively high performance, from informal, user-generated content. Our approach is particularly scalable, suitable for social media mining, as it relies on large volumes of unlabeled data, thus diminishing the need for large, annotated training data sets.

Keywords: ADR; adverse drug reaction; deep learning word embeddings; machine learning; natural language processing; pharmacovigilance; social media mining.

PubMed Disclaimer

Figures

**Figure 1**
Examples of user-posted drug reviews in Twitter (a) and DailyStrength (b).

**Figure 2**
Calculated features representing three CRF classification instances.

**Figure 3**
The impact of embedding clusters on precision, recall (a), and F-measure (b), when training the CRF on variable training set sizes and testing on the same test set.

**Figure 4**
Examples of successfully extracted concepts using ADRMine.

**Figure 5**
Examples of concepts that could only be extracted after adding the embedding cluster features to ADRMine. These concepts are starred and other extracted concepts are highlighted.

**Figure 6**
Analysis of false positive and false negatives produced by the ADR extraction approach.

See this image and copyright information in PMC

References

1. Pirmohamed M, James S, Meakin S, et al. Adverse drug reactions as cause of admission to hospital: prospective analysis of 18 820 patients. BMJ. 2004;329:15–19. - PMC - PubMed
1. Sultana J, Cutroneo P, Trifirò G. Clinical and economic burden of adverse drug reactions. J Pharmacol Pharmacother. 2013;4:S73–S77. - PMC - PubMed
1. Aagaard L, Nielsen LH, Hansen EH. Consumer reporting of adverse drug reactions: a retrospective analysis of the Danish adverse drug reaction database from 2004 to 2006. Drug Saf. 2009;32:1067–1074. - PubMed
1. Avery AJ, Anderson C, Bond CM, et al. Evaluation of patient reporting of adverse drug reactions to the UK “Yellow Card Scheme”: literature review, descriptive and qualitative analyses, and questionnaire surveys. Southampton: NIHR HTA; 2011. doi:10.3310/hta15200. - PubMed
1. Van Geffen ECG, van der Wal SW, van Hulten R, et al. Evaluation of patients’ experiences with antidepressants reported by means of a medicine reporting system. Eur J Clin Pharmacol. 2007;63:1193–1199. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 LM011176/LM/NLM NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features

Affiliations

Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials