The Use of Natural Language Processing Methods in Reddit to Investigate Opioid Use: Scoping Review
- PMID: 39269743
- PMCID: PMC11437337
- DOI: 10.2196/51156
The Use of Natural Language Processing Methods in Reddit to Investigate Opioid Use: Scoping Review
Abstract
Background: The growing availability of big data spontaneously generated by social media platforms allows us to leverage natural language processing (NLP) methods as valuable tools to understand the opioid crisis.
Objective: We aimed to understand how NLP has been applied to Reddit (Reddit Inc) data to study opioid use.
Methods: We systematically searched for peer-reviewed studies and conference abstracts in PubMed, Scopus, PsycINFO, ACL Anthology, IEEE Xplore, and Association for Computing Machinery data repositories up to July 19, 2022. Inclusion criteria were studies investigating opioid use, using NLP techniques to analyze the textual corpora, and using Reddit as the social media data source. We were specifically interested in mapping studies' overarching goals and findings, methodologies and software used, and main limitations.
Results: In total, 30 studies were included, which were classified into 4 nonmutually exclusive overarching goal categories: methodological (n=6, 20% studies), infodemiology (n=22, 73% studies), infoveillance (n=7, 23% studies), and pharmacovigilance (n=3, 10% studies). NLP methods were used to identify content relevant to opioid use among vast quantities of textual data, to establish potential relationships between opioid use patterns or profiles and contextual factors or comorbidities, and to anticipate individuals' transitions between different opioid-related subreddits, likely revealing progression through opioid use stages. Most studies used an embedding technique (12/30, 40%), prediction or classification approach (12/30, 40%), topic modeling (9/30, 30%), and sentiment analysis (6/30, 20%). The most frequently used programming languages were Python (20/30, 67%) and R (2/30, 7%). Among the studies that reported limitations (20/30, 67%), the most cited was the uncertainty regarding whether redditors participating in these forums were representative of people who use opioids (8/20, 40%). The papers were very recent (28/30, 93%), from 2019 to 2022, with authors from a range of disciplines.
Conclusions: This scoping review identified a wide variety of NLP techniques and applications used to support surveillance and social media interventions addressing the opioid crisis. Despite the clear potential of these methods to enable the identification of opioid-relevant content in Reddit and its analysis, there are limits to the degree of interpretive meaning that they can provide. Moreover, we identified the need for standardized ethical guidelines to govern the use of Reddit data to safeguard the anonymity and privacy of people using these forums.
Keywords: NLP; Reddit; machine learning; natural language processing; opioid.
©Alexandra Almeida, Thomas Patton, Mike Conway, Amarnath Gupta, Steffanie A Strathdee, Annick Bórquez. Originally published in JMIR Infodemiology (https://infodemiology.jmir.org), 13.09.2024.
Conflict of interest statement
Conflicts of Interest: None declared.
Figures


Similar articles
-
Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study.J Med Internet Res. 2020 Oct 12;22(10):e22635. doi: 10.2196/22635. J Med Internet Res. 2020. PMID: 32936777 Free PMC article.
-
Investigating Substance Use via Reddit: Systematic Scoping Review.J Med Internet Res. 2023 Oct 25;25:e48905. doi: 10.2196/48905. J Med Internet Res. 2023. PMID: 37878361 Free PMC article.
-
Concerns among people who use opioids during the COVID-19 pandemic: a natural language processing analysis of social media posts.Subst Abuse Treat Prev Policy. 2022 Mar 5;17(1):16. doi: 10.1186/s13011-022-00442-w. Subst Abuse Treat Prev Policy. 2022. PMID: 35248103 Free PMC article.
-
Patterns of Routes of Administration and Drug Tampering for Nonmedical Opioid Consumption: Data Mining and Content Analysis of Reddit Discussions.J Med Internet Res. 2021 Jan 4;23(1):e21212. doi: 10.2196/21212. J Med Internet Res. 2021. PMID: 33393910 Free PMC article.
-
The Utilization of Natural Language Processing for Analyzing Social Media Data in Nursing Research: A Scoping Review.J Nurs Manag. 2024 Dec 30;2024:2857497. doi: 10.1155/jonm/2857497. eCollection 2024. J Nurs Manag. 2024. PMID: 40224767 Free PMC article.
Cited by
-
Monitoring the opioid epidemic via social media discussions.NPJ Digit Med. 2025 May 15;8(1):284. doi: 10.1038/s41746-025-01642-x. NPJ Digit Med. 2025. PMID: 40374984 Free PMC article.
-
Medication experiences in the treatment of opioid use disorders: Insights from Reddit.Addiction. 2025 Aug;120(8):1610-1622. doi: 10.1111/add.70022. Epub 2025 Mar 13. Addiction. 2025. PMID: 40079161 Free PMC article.
References
-
- Degenhardt L, Grebely J, Stone J, Hickman M, Vickerman P, Marshall BD, Bruneau J, Altice Fl, Henderson G, Rahimi-Movaghar A, Larney S. Global patterns of opioid use and dependence: harms to populations, interventions, and future action. Lancet. 2019 Oct 26;394(10208):1560–79. doi: 10.1016/S0140-6736(19)32229-9. https://europepmc.org/abstract/MED/31657732 S0140-6736(19)32229-9 - DOI - PMC - PubMed
-
- UNODC world drug report 2019. UN Office on Drugs and Crime. [2024-04-29]. https://tinyurl.com/zbxdwx7k .
-
- Drug overdose death rates. National Institute on Drug Abuse. 2023. [2024-04-29]. https://nida.nih.gov/research-topics/trends-statistics/overdose-death-rates .
-
- U.S. overdose deaths in 2021 increased half as much as in 2020 – but are still up 15% National Center for Health Statistics. 2022. [2024-04-18]. https://www.cdc.gov/nchs/pressroom/nchs_press_releases/2022/202205.htm#:... .
-
- Allem JP, Dharmapuri L, Unger JB, Cruz TB. Characterizing JUUL-related posts on Twitter. Drug Alcohol Depend. 2018 Sep 01;190:1–5. doi: 10.1016/j.drugalcdep.2018.05.018. https://europepmc.org/abstract/MED/29958115 S0376-8716(18)30333-8 - DOI - PMC - PubMed