Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets
- PMID: 28253919
- PMCID: PMC5335787
- DOI: 10.1186/s13326-017-0120-6
Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets
Abstract
Background: Analysing public opinions on HPV vaccines on social media using machine learning based approaches will help us understand the reasons behind the low vaccine coverage and come up with corresponding strategies to improve vaccine uptake.
Objective: To propose a machine learning system that is able to extract comprehensive public sentiment on HPV vaccines on Twitter with satisfying performance.
Method: We collected and manually annotated 6,000 HPV vaccines related tweets as a gold standard. SVM model was chosen and a hierarchical classification method was proposed and evaluated. Additional feature sets evaluation and model parameters optimization was done to maximize the machine learning model performance.
Results: A hierarchical classification scheme that contains 10 categories was built to access public opinions toward HPV vaccines comprehensively. A 6,000 annotated tweets gold corpus with Kappa annotation agreement at 0.851 was created and made public available. The hierarchical classification model with optimized feature sets and model parameters has increased the micro-averaging and macro-averaging F score from 0.6732 and 0.3967 to 0.7442 and 0.5883 respectively, compared with baseline model.
Conclusions: Our work provides a systematical way to improve the machine learning model performance on the highly unbalanced HPV vaccines related tweets corpus. Our system can be further applied on a large tweets corpus to extract large-scale public opinion towards HPV vaccines.
Keywords: Gold standard; Hierarchical classification; Sentiment analysis; Social media; Support vector machines; Twitter.
Figures
References
-
- Centers for Disease Control and Prevention. HPV-Associated Cancers Statistics [Internet]. Available from: http://www.cdc.gov/cancer/hpv/statistics/index.htm. Accessed July 2016.
-
- Farmar AL, Love-Osborne K, Chichester K, Breslin K, Bronkan K, Hambidge SJ. Achieving High Adolescent HPV Vaccination Coverage. Pediatrics. 2016;5:e20152653. - PubMed
-
- Twitter Usage Statistics [Internet]. Available from: http://www.internetlivestats.com/twitter-statistics/. Accessed Feb 2017.
-
- Thakkar H, Patel D. Approaches for sentiment analysis on twitter: A state-of-art study. arXiv preprint arXiv:1512.01043. Accessed 3 Dec 2015.
-
- Pak A, Paroubek P. Twitter as a Corpus for Sentiment Analysis and Opinion Mining. InLREc 2010;10(2010).
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
