Promoting Reproducible Research for Characterizing Nonmedical Use of Medications Through Data Annotation: Description of a Twitter Corpus and Guidelines
- PMID: 32130117
- PMCID: PMC7066507
- DOI: 10.2196/15861
Promoting Reproducible Research for Characterizing Nonmedical Use of Medications Through Data Annotation: Description of a Twitter Corpus and Guidelines
Abstract
Background: Social media data are being increasingly used for population-level health research because it provides near real-time access to large volumes of consumer-generated data. Recently, a number of studies have explored the possibility of using social media data, such as from Twitter, for monitoring prescription medication abuse. However, there is a paucity of annotated data or guidelines for data characterization that discuss how information related to abuse-prone medications is presented on Twitter.
Objective: This study discusses the creation of an annotated corpus suitable for training supervised classification algorithms for the automatic classification of medication abuse-related chatter. The annotation strategies used for improving interannotator agreement (IAA), a detailed annotation guideline, and machine learning experiments that illustrate the utility of the annotated corpus are also described.
Methods: We employed an iterative annotation strategy, with interannotator discussions held and updates made to the annotation guidelines at each iteration to improve IAA for the manual annotation task. Using the grounded theory approach, we first characterized tweets into fine-grained categories and then grouped them into 4 broad classes-abuse or misuse, personal consumption, mention, and unrelated. After the completion of manual annotations, we experimented with several machine learning algorithms to illustrate the utility of the corpus and generate baseline performance metrics for automatic classification on these data.
Results: Our final annotated set consisted of 16,443 tweets mentioning at least 20 abuse-prone medications including opioids, benzodiazepines, atypical antipsychotics, central nervous system stimulants, and gamma-aminobutyric acid analogs. Our final overall IAA was 0.86 (Cohen kappa), which represents high agreement. The manual annotation process revealed the variety of ways in which prescription medication misuse or abuse is discussed on Twitter, including expressions indicating coingestion, nonmedical use, nonstandard route of intake, and consumption above the prescribed doses. Among machine learning classifiers, support vector machines obtained the highest automatic classification accuracy of 73.00% (95% CI 71.4-74.5) over the test set (n=3271).
Conclusions: Our manual analysis and annotations of a large number of tweets have revealed types of information posted on Twitter about a set of abuse-prone prescription medications and their distributions. In the interests of reproducible and community-driven research, we have made our detailed annotation guidelines and the training data for the classification experiments publicly available, and the test data will be used in future shared tasks.
Keywords: infodemiology; infoveillance; machine learning; natural language processing; prescription drug misuse; social media; substance abuse detection.
©Karen O'Connor, Abeed Sarker, Jeanmarie Perrone, Graciela Gonzalez Hernandez. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 26.02.2020.
Conflict of interest statement
Conflicts of Interest: None declared.
Figures
Similar articles
-
Developing an Automatic System for Classifying Chatter About Health Services on Twitter: Case Study for Medicaid.J Med Internet Res. 2021 May 3;23(5):e26616. doi: 10.2196/26616. J Med Internet Res. 2021. PMID: 33938807 Free PMC article.
-
Social Media Mining for Toxicovigilance: Automatic Monitoring of Prescription Medication Abuse from Twitter.Drug Saf. 2016 Mar;39(3):231-40. doi: 10.1007/s40264-015-0379-4. Drug Saf. 2016. PMID: 26748505 Free PMC article.
-
Text classification models for the automatic detection of nonmedical prescription medication use from social media.BMC Med Inform Decis Mak. 2021 Jan 26;21(1):27. doi: 10.1186/s12911-021-01394-0. BMC Med Inform Decis Mak. 2021. PMID: 33499852 Free PMC article.
-
Mining social media for prescription medication abuse monitoring: a review and proposal for a data-centric framework.J Am Med Inform Assoc. 2020 Feb 1;27(2):315-329. doi: 10.1093/jamia/ocz162. J Am Med Inform Assoc. 2020. PMID: 31584645 Free PMC article. Review.
-
Methods to Establish Race or Ethnicity of Twitter Users: Scoping Review.J Med Internet Res. 2022 Apr 29;24(4):e35788. doi: 10.2196/35788. J Med Internet Res. 2022. PMID: 35486433 Free PMC article.
Cited by
-
Bootstrapping semi-supervised annotation method for potential suicidal messages.Internet Interv. 2022 Feb 28;28:100519. doi: 10.1016/j.invent.2022.100519. eCollection 2022 Apr. Internet Interv. 2022. PMID: 35281704 Free PMC article. Review.
-
Classifying Characteristics of Opioid Use Disorder From Hospital Discharge Summaries Using Natural Language Processing.Front Public Health. 2022 May 9;10:850619. doi: 10.3389/fpubh.2022.850619. eCollection 2022. Front Public Health. 2022. PMID: 35615042 Free PMC article.
-
Automatic gender detection in Twitter profiles for health-related cohort studies.JAMIA Open. 2021 Jun 23;4(2):ooab042. doi: 10.1093/jamiaopen/ooab042. eCollection 2021 Apr. JAMIA Open. 2021. PMID: 34169232 Free PMC article.
-
Promoting Health Literacy With Human-in-the-Loop Video Understandability Classification of YouTube Videos: Development and Evaluation Study.J Med Internet Res. 2025 Apr 8;27:e56080. doi: 10.2196/56080. J Med Internet Res. 2025. PMID: 40198918 Free PMC article.
-
Transferability Based on Drug Structure Similarity in the Automatic Classification of Noncompliant Drug Use on Social Media: Natural Language Processing Approach.J Med Internet Res. 2023 May 3;25:e44870. doi: 10.2196/44870. J Med Internet Res. 2023. PMID: 37133915 Free PMC article.
References
-
- Bennett WL. The Personalization of Politics. Ann Am Acad Pol Soc Sci. 2012;644(1):20–39. doi: 10.1177/0002716212451428. - DOI
-
- Parganas P, Anagnostopoulos C, Chadwick S. 'You’ll never tweet alone': Managing sports brands through social media. J Brand Manag. 2015;22(7):551–68. doi: 10.1057/bm.2015.32. - DOI
-
- Xu WW, Chiu I, Chen Y, Mukherjee T. Twitter hashtags for health: applying network and content analyses to understand the health knowledge sharing in a Twitter-based community of practice. Qual Quant. 2014;49(4):1361–80. doi: 10.1007/s11135-014-0051-6. - DOI
-
- Kennedy B, Funk C. Pew Research Center. 2015. Dec 11, [2019-03-16]. Public Interest in Science and Health Linked to Gender, Age and Personality https://www.pewresearch.org/science/2015/12/11/public-interest-in-scienc...
-
- Paul MJ, Dredze M, Broniatowski D. Twitter improves influenza forecasting. PLoS Curr. 2014 Oct 28;6:pii: ecurrents.outbreaks.90b9ed0f59bae4ccaa683a39865d9117. doi: 10.1371/currents.outbreaks.90b9ed0f59bae4ccaa683a39865d9117. doi: 10.1371/currents.outbreaks.90b9ed0f59bae4ccaa683a39865d9117. - DOI - DOI - PMC - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources