Transferability Based on Drug Structure Similarity in the Automatic Classification of Noncompliant Drug Use on Social Media: Natural Language Processing Approach
- PMID: 37133915
- PMCID: PMC10193216
- DOI: 10.2196/44870
Transferability Based on Drug Structure Similarity in the Automatic Classification of Noncompliant Drug Use on Social Media: Natural Language Processing Approach
Abstract
Background: Medication noncompliance is a critical issue because of the increased number of drugs sold on the web. Web-based drug distribution is difficult to control, causing problems such as drug noncompliance and abuse. The existing medication compliance surveys lack completeness because it is impossible to cover patients who do not go to the hospital or provide accurate information to their doctors, so a social media-based approach is being explored to collect information about drug use. Social media data, which includes information on drug usage by users, can be used to detect drug abuse and medication compliance in patients.
Objective: This study aimed to assess how the structural similarity of drugs affects the efficiency of machine learning models for text classification of drug noncompliance.
Methods: This study analyzed 22,022 tweets about 20 different drugs. The tweets were labeled as either noncompliant use or mention, noncompliant sales, general use, or general mention. The study compares 2 methods for training machine learning models for text classification: single-sub-corpus transfer learning, in which a model is trained on tweets about a single drug and then tested on tweets about other drugs, and multi-sub-corpus incremental learning, in which models are trained on tweets about drugs in order of their structural similarity. The performance of a machine learning model trained on a single subcorpus (a data set of tweets about a specific category of drugs) was compared to the performance of a model trained on multiple subcorpora (data sets of tweets about multiple categories of drugs).
Results: The results showed that the performance of the model trained on a single subcorpus varied depending on the specific drug used for training. The Tanimoto similarity (a measure of the structural similarity between compounds) was weakly correlated with the classification results. The model trained by transfer learning a corpus of drugs with close structural similarity performed better than the model trained by randomly adding a subcorpus when the number of subcorpora was small.
Conclusions: The results suggest that structural similarity improves the classification performance of messages about unknown drugs if the drugs in the training corpus are few. On the other hand, this indicates that there is little need to consider the influence of the Tanimoto structural similarity if a sufficient variety of drugs are ensured.
Keywords: data mining; machine learning; medication noncompliance; natural language processing; pharmacovigilance; text classification; transfer learning.
©Tomohiro Nishiyama, Shuntaro Yada, Shoko Wakamiya, Satoko Hori, Eiji Aramaki. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 03.05.2023.
Conflict of interest statement
Conflicts of Interest: None declared.
Figures








Similar articles
-
Social media mining for birth defects research: A rule-based, bootstrapping approach to collecting data for rare health-related events on Twitter.J Biomed Inform. 2018 Nov;87:68-78. doi: 10.1016/j.jbi.2018.10.001. Epub 2018 Oct 4. J Biomed Inform. 2018. PMID: 30292855 Free PMC article.
-
Detecting Potentially Harmful and Protective Suicide-Related Content on Twitter: Machine Learning Approach.J Med Internet Res. 2022 Aug 17;24(8):e34705. doi: 10.2196/34705. J Med Internet Res. 2022. PMID: 35976193 Free PMC article.
-
Results and Methodological Implications of the Digital Epidemiology of Prescription Drug References Among Twitter Users: Latent Dirichlet Allocation (LDA) Analyses.J Med Internet Res. 2023 Jul 28;25:e48405. doi: 10.2196/48405. J Med Internet Res. 2023. PMID: 37505795 Free PMC article.
-
Mining social media for prescription medication abuse monitoring: a review and proposal for a data-centric framework.J Am Med Inform Assoc. 2020 Feb 1;27(2):315-329. doi: 10.1093/jamia/ocz162. J Am Med Inform Assoc. 2020. PMID: 31584645 Free PMC article. Review.
-
How to apply zero-shot learning to text data in substance use research: An overview and tutorial with media data.Addiction. 2024 May;119(5):951-959. doi: 10.1111/add.16427. Epub 2024 Jan 11. Addiction. 2024. PMID: 38212974 Review.
Cited by
-
Characterizing Public Sentiments and Drug Interactions in the COVID-19 Pandemic Using Social Media: Natural Language Processing and Network Analysis.J Med Internet Res. 2025 Mar 5;27:e63755. doi: 10.2196/63755. J Med Internet Res. 2025. PMID: 40053730 Free PMC article.
References
-
- Miller TA. Health literacy and adherence to medical treatment in chronic and acute illness: a meta-analysis. Patient Educ Couns. 2016;99(7):1079–1086. doi: 10.1016/j.pec.2016.01.020. https://europepmc.org/abstract/MED/26899632 S0738-3991(16)30041-6 - DOI - PMC - PubMed
-
- Long CS, Kumaran H, Goh KW, Bakrin FS, Ming LC, Rehman IU, Dhaliwal JS, Hadi MA, Sim YW, Tan CS. Online pharmacies selling prescription drugs: systematic review. Pharmacy. 2022;10(2):42. doi: 10.3390/pharmacy10020042. https://www.mdpi.com/resolver?pii=pharmacy10020042 pharmacy10020042 - DOI - PMC - PubMed
-
- Onishi T, Weissenbacher D, Klein A, O’Connor K, Gonzalez-Hernandez G. Dealing with medication non-adherence expressions in Twitter. Proceedings of the 2018 EMNLP Workshop SMM4H; The 3rd Social Media Mining for Health Applications Workshop & Shared Task; October 31, 2018; Brussels, Belgium. Association for Computational Linguistics; 2018. pp. 32–33. - DOI
-
- Bhattacharya M, Snyder S, Malin M, Truffa MM, Marinic S, Engelmann R, Raheja RR. Using social media data in routine pharmacovigilance: a pilot study to identify safety signals and patient perspectives. Pharm Med. 2017;31(3):167–174. doi: 10.1007/s40290-017-0186-6. - DOI
-
- Xie J, Zeng D, Liu X, Fang X. Understanding reasons for medication nonadherence: an exploration in social media using sentiment-enriched deep learning approach. Proceedings of the International Conference on Information Systems - Transforming Society with Digital Innovation; 38th ICIS 2017; December 10-13, 2017; Seoul, South Korea. Association for Information Systems; 2017. - DOI