. 2020:25:319-330.

Towards identifying drug side effects from social media using active learning and crowd sourcing

Sophie Burkhardt^{1

2}, Julia Siekiera, Josua Glodde, Miguel A Andrade-Navarro, Stefan Kramer

Affiliations

¹ Department of Computer Science, Johannes Gutenberg University, Mainz, 55128, Germany.
² Department of Biology, Johannes Gutenberg University, Institute of Molecular Biology, Mainz, 55128, Germany, burkhardt@informatik.uni-mainz.de.

PMID: 31797607

Free article

Towards identifying drug side effects from social media using active learning and crowd sourcing

Sophie Burkhardt et al. Pac Symp Biocomput. 2020.

Free article

. 2020:25:319-330.

Authors

Sophie Burkhardt^{1

2}, Julia Siekiera, Josua Glodde, Miguel A Andrade-Navarro, Stefan Kramer

Affiliations

¹ Department of Computer Science, Johannes Gutenberg University, Mainz, 55128, Germany.
² Department of Biology, Johannes Gutenberg University, Institute of Molecular Biology, Mainz, 55128, Germany, burkhardt@informatik.uni-mainz.de.

PMID: 31797607

Abstract

Motivation: Social media is a largely untapped source of information on side effects of drugs. Twitter in particular is widely used to report on everyday events and personal ailments. However, labeling this noisy data is a difficult problem because labeled training data is sparse and automatic labeling is error-prone. Crowd sourcing can help in such a scenario to obtain more reliable labels, but is expensive in comparison because workers have to be paid. To remedy this, semi-supervised active learning may reduce the number of labeled data needed and focus the manual labeling process on important information.

Results: We extracted data from Twitter using the public API. We subsequently use Amazon Mechanical Turk in combination with a state-of-the-art semi-supervised active learning method to label tweets with their associated drugs and side effects in two stages. Our results show that our method is an effective way of discovering side effects in tweets with an improvement from 53% F-measure to 67% F-measure as compared to a one stage work flow. Additionally, we show the effectiveness of the active learning scheme in reducing the labeling cost in comparison to a non-active baseline.

Availability: Code and data will be published on https://github.com/kramerlab.

PubMed Disclaimer

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- World Scientific Publishing Company
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Towards identifying drug side effects from social media using active learning and crowd sourcing

Affiliations

Towards identifying drug side effects from social media using active learning and crowd sourcing

Authors

Affiliations

Abstract

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical