. 2013 Aug 29;15(8):e174.

doi: 10.2196/jmir.2534.

Using twitter to examine smoking behavior and perceptions of emerging tobacco products

Mark Myslín¹, Shu-Hong Zhu, Wendy Chapman, Mike Conway

Affiliations

PMID: 23989137
PMCID: PMC3758063
DOI: 10.2196/jmir.2534

Using twitter to examine smoking behavior and perceptions of emerging tobacco products

Mark Myslín et al. J Med Internet Res. 2013.

. 2013 Aug 29;15(8):e174.

doi: 10.2196/jmir.2534.

Authors

Mark Myslín¹, Shu-Hong Zhu, Wendy Chapman, Mike Conway

Affiliation

¹ Department of Linguistics, University of California, San Diego, La Jolla, CA 92093, USA.

PMID: 23989137
PMCID: PMC3758063
DOI: 10.2196/jmir.2534

Abstract

Background: Social media platforms such as Twitter are rapidly becoming key resources for public health surveillance applications, yet little is known about Twitter users' levels of informedness and sentiment toward tobacco, especially with regard to the emerging tobacco control challenges posed by hookah and electronic cigarettes.

Objective: To develop a content and sentiment analysis of tobacco-related Twitter posts and build machine learning classifiers to detect tobacco-relevant posts and sentiment towards tobacco, with a particular focus on new and emerging products like hookah and electronic cigarettes.

Methods: We collected 7362 tobacco-related Twitter posts at 15-day intervals from December 2011 to July 2012. Each tweet was manually classified using a triaxial scheme, capturing genre, theme, and sentiment. Using the collected data, machine-learning classifiers were trained to detect tobacco-related vs irrelevant tweets as well as positive vs negative sentiment, using Naïve Bayes, k-nearest neighbors, and Support Vector Machine (SVM) algorithms. Finally, phi contingency coefficients were computed between each of the categories to discover emergent patterns.

Results: The most prevalent genres were first- and second-hand experience and opinion, and the most frequent themes were hookah, cessation, and pleasure. Sentiment toward tobacco was overall more positive (1939/4215, 46% of tweets) than negative (1349/4215, 32%) or neutral among tweets mentioning it, even excluding the 9% of tweets categorized as marketing. Three separate metrics converged to support an emergent distinction between, on one hand, hookah and electronic cigarettes corresponding to positive sentiment, and on the other hand, traditional tobacco products and more general references corresponding to negative sentiment. These metrics included correlations between categories in the annotation scheme (phihookah-positive=0.39; phi(e-cigs)-positive=0.19); correlations between search keywords and sentiment (χ²₄=414.50, P<.001, Cramer's V=0.36), and the most discriminating unigram features for positive and negative sentiment ranked by log odds ratio in the machine learning component of the study. In the automated classification tasks, SVMs using a relatively small number of unigram features (500) achieved best performance in discriminating tobacco-related from unrelated tweets (F score=0.85).

Conclusions: Novel insights available through Twitter for tobacco surveillance are attested through the high prevalence of positive sentiment. This positive sentiment is correlated in complex ways with social image, personal experience, and recently popular products such as hookah and electronic cigarettes. Several apparent perceptual disconnects between these products and their health effects suggest opportunities for tobacco control education. Finally, machine classification of tobacco-related posts shows a promising edge over strictly keyword-based approaches, yielding an improved signal-to-noise ratio in Twitter data and paving the way for automated tobacco surveillance applications.

Keywords: natural language processing; smoking; social media; twitter messaging.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

**Figure 1**
Correlations between all pairwise combinations of categories; values range from 0-1; correlations greater than 0.3 are underlined.

**Figure 2**
Example tweets manually classified using annotation scheme (relevant categories are shaded).

**Figure 3**
Machine learning algorithm description.

**Figure 4**
N-gram text representation.

**Figure 5**
Machine learning experiment workflow.

**Figure 6**
Tweet sentiment by search keyword.

**Figure 7**
Classification accuracy as a function of number of unigram features for 3 algorithms in the tobacco-relevance task.

See this image and copyright information in PMC

References

1. Eysenbach G. Infodemiology and infoveillance: framework for an emerging set of public health informatics methods to analyze search, communication and publication behavior on the Internet. J Med Internet Res. 2009;11(1):e11. doi: 10.2196/jmir.1157. http://www.jmir.org/2009/1/e11/ - DOI - PMC - PubMed
1. Twitter turns six. [2012-10-13]. http://blog.twitter.com/2012/03/twitter-turns-six.html.
1. Mislove A, Lehmann S, Ahn Y, Onnela J, Rosenquist J. Understanding the demographics of Twitter users. Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media; Fifth International AAAI Conference on Weblogs and Social Media; July 17-21, 2011; Barcelona. 2011.
1. Smith A, Brenner J. Twitter use 2012. Pew Internet Research; [2012-10-13]. 6BOdGaoJh http://pewinternet.org/Reports/2012/Twitter-Use-2012.aspx.
1. Doan S, Vo B, Collier N. An analysis of Twitter messages in the 2011 Tohoku Earthquake. 2011. [2012-10-14]. http://www.informatik.uni-trier.de/~ley/db/conf/ehealth/ehealth2011.html.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Using twitter to examine smoking behavior and perceptions of emerging tobacco products

Affiliation

Using twitter to examine smoking behavior and perceptions of emerging tobacco products

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources