Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 3;8(1):9.
doi: 10.1186/s13326-017-0120-6.

Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets

Affiliations

Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets

Jingcheng Du et al. J Biomed Semantics. .

Abstract

Background: Analysing public opinions on HPV vaccines on social media using machine learning based approaches will help us understand the reasons behind the low vaccine coverage and come up with corresponding strategies to improve vaccine uptake.

Objective: To propose a machine learning system that is able to extract comprehensive public sentiment on HPV vaccines on Twitter with satisfying performance.

Method: We collected and manually annotated 6,000 HPV vaccines related tweets as a gold standard. SVM model was chosen and a hierarchical classification method was proposed and evaluated. Additional feature sets evaluation and model parameters optimization was done to maximize the machine learning model performance.

Results: A hierarchical classification scheme that contains 10 categories was built to access public opinions toward HPV vaccines comprehensively. A 6,000 annotated tweets gold corpus with Kappa annotation agreement at 0.851 was created and made public available. The hierarchical classification model with optimized feature sets and model parameters has increased the micro-averaging and macro-averaging F score from 0.6732 and 0.3967 to 0.7442 and 0.5883 respectively, compared with baseline model.

Conclusions: Our work provides a systematical way to improve the machine learning model performance on the highly unbalanced HPV vaccines related tweets corpus. Our system can be further applied on a large tweets corpus to extract large-scale public opinion towards HPV vaccines.

Keywords: Gold standard; Hierarchical classification; Sentiment analysis; Social media; Support vector machines; Twitter.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Sentiment classification scheme for HPV vaccines related tweets: The categories in colored rectangles (other than black) are all possible sentiment labels that can be assigned to the tweets
Fig. 2
Fig. 2
Overview of the machine learning based system and optimization approach: (a) modularized machine learning system framework; (b) machine learning optimization steps
Fig. 3
Fig. 3
Sentiment distribution in 6,000 tweets gold standard. (Neg: Negative)

References

    1. Centers for Disease Control and Prevention. HPV-Associated Cancers Statistics [Internet]. Available from: http://www.cdc.gov/cancer/hpv/statistics/index.htm. Accessed July 2016.
    1. Farmar AL, Love-Osborne K, Chichester K, Breslin K, Bronkan K, Hambidge SJ. Achieving High Adolescent HPV Vaccination Coverage. Pediatrics. 2016;5:e20152653. - PubMed
    1. Twitter Usage Statistics [Internet]. Available from: http://www.internetlivestats.com/twitter-statistics/. Accessed Feb 2017.
    1. Thakkar H, Patel D. Approaches for sentiment analysis on twitter: A state-of-art study. arXiv preprint arXiv:1512.01043. Accessed 3 Dec 2015.
    1. Pak A, Paroubek P. Twitter as a Corpus for Sentiment Analysis and Opinion Mining. InLREc 2010;10(2010).

Substances

LinkOut - more resources