Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 1;27(2):225-235.
doi: 10.1093/jamia/ocz191.

Mining Twitter to assess the determinants of health behavior toward human papillomavirus vaccination in the United States

Affiliations

Mining Twitter to assess the determinants of health behavior toward human papillomavirus vaccination in the United States

Hansi Zhang et al. J Am Med Inform Assoc. .

Abstract

Objectives: The study sought to test the feasibility of using Twitter data to assess determinants of consumers' health behavior toward human papillomavirus (HPV) vaccination informed by the Integrated Behavior Model (IBM).

Materials and methods: We used 3 Twitter datasets spanning from 2014 to 2018. We preprocessed and geocoded the tweets, and then built a rule-based model that classified each tweet into either promotional information or consumers' discussions. We applied topic modeling to discover major themes and subsequently explored the associations between the topics learned from consumers' discussions and the responses of HPV-related questions in the Health Information National Trends Survey (HINTS).

Results: We collected 2 846 495 tweets and analyzed 335 681 geocoded tweets. Through topic modeling, we identified 122 high-quality topics. The most discussed consumer topic is "cervical cancer screening"; while in promotional tweets, the most popular topic is to increase awareness of "HPV causes cancer." A total of 87 of the 122 topics are correlated between promotional information and consumers' discussions. Guided by IBM, we examined the alignment between our Twitter findings and the results obtained from HINTS. Thirty-five topics can be mapped to HINTS questions by keywords, 112 topics can be mapped to IBM constructs, and 45 topics have statistically significant correlations with HINTS responses in terms of geographic distributions.

Conclusions: Mining Twitter to assess consumers' health behaviors can not only obtain results comparable to surveys, but also yield additional insights via a theory-driven approach. Limitations exist; nevertheless, these encouraging results impel us to develop innovative ways of leveraging social media in the changing health communication landscape.

Keywords: Twitter; human papillomavirus vaccine; integrated behavior model; social media; topic modeling.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The overall data analysis workflow. The analysis consists of 4 steps: (1) data preprocessing; (2) rule-based classification of the tweets into either promotional information or consumers’ discussions; (3) applying topic modeling to discover major discussion themes and exploring associations between topics in consumers’ Twitter discussions and responses to the 8 human papillomavirus (HPV)–related Health Information National Trends Survey (HINTS) questions; and (4) based on these analyses, answering 3 research questions (RQs). IBM: Integrated Behavior Model; LDA: latent Dirichlet allocation.
Figure 2.
Figure 2.
A rule-based categorization of the tweets into promotional HPV-related information and consumers’ discussions. *If a tweet does not include a Uniform Resource Locator (URL), it is considered as a consumer discussion. Even if it is a retweet (ie, starts with “rt”), the retweet is consumers’ discussions, as we considered that the user who retweeted agrees with the original user’s discussion and the original tweet is also consumers’ discussions (as there is no URL). When a tweet contains URLs, the rules are more complex. First, if a tweet is quoting another tweet or web resource (ie, “is_quote_status” = True) and is not a retweet, it is considered as consumers’ discussions. In the special case in which the tweet is a retweet of a quoting tweet, we consider this as promotional information because we are unable to determine which of the comments the current user agrees with. In essence, when a tweet is a retweet, we classified the retweet based on the original tweet. Second, if a tweet is not a quoting tweet, it is considered as promotional information. HPV: human papillomavirus.
Figure 3.
Figure 3.
The 3 most popular topics in (A) promotional information and (B) consumers’ discussions related to human papillomavirus (HPV) and HPV vaccination.
Figure 4.
Figure 4.
The monthly tweet volumes of promotional human papillomavirus (HPV)–related information and consumers’ discussion.
Figure 5.
Figure 5.
Mapping consumer discussion topics to constructs in the Integrated Behavior Model (IBM), including topics (1) directly mapped to IBM constructs and (2) first mapped to question groups (QGs) and then mapped to IBM constructs (eg, knowledge—QG1, attitude—QG4, perceived norm—QG5). HPV: human papillomavirus.
Figure 6.
Figure 6.
Geographic heatmaps for the state-level distributions of (1) the responses to Health Information National Trends Survey (HINTS) question group 2 (QG2), (2) the number of tweets in topic 17 that was mapped to QG2 by keywords with a correlation ρ = 0.35 (P < .05), and (3) the number of tweets in topic 127 that was NOT mapped to QG2 by keywords but had the strongest correlation with QG2 (ρ = 0.55, P < .01). The intensity of the color is proportional to the volumes of tweets assigned to that topic or the number of HINTS responses of interest.

References

    1. Centers for Disease Control and Prevention. Genital HPV Infection-Fact Sheet; 2017. https://www.cdc.gov/std/hpv/stdfact-hpv.htm. Accessed August 21, 2018.
    1. Centers for Disease Control and Prevention. Human Papillomavirus (HPV) Questions and Answers; 2018. https://www.cdc.gov/hpv/parents/questions-answers.html. Accessed August 21, 2018.
    1. Centers for Disease Control and Prevention. HPV Vaccines: Vaccinating Your Preteen or Teen; 2018. https://www.cdc.gov/hpv/parents/vaccine.html. Accessed November 28, 2018.
    1. Walker TY, Elam-Evans LD, Singleton JA, et al. National, regional, state, and selected local area vaccination coverage among adolescents aged 13–17 years—United States, 2016. MMWR Morb Mortal Wkly Rep 2017; 66 (33): 874–82. - PMC - PubMed
    1. Glanz K, Rimer BK, Viswanath K, eds. Health Behavior and Health Education: Theory, Research, and Practice. 4th ed.San Francisco, CA: Jossey-Bass; 2008.

Publication types

Substances