Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul;22(7):2322-2333.
doi: 10.1007/s10461-018-2046-0.

An Online Risk Index for the Cross-Sectional Prediction of New HIV Chlamydia, and Gonorrhea Diagnoses Across U.S. Counties and Across Years

Affiliations

An Online Risk Index for the Cross-Sectional Prediction of New HIV Chlamydia, and Gonorrhea Diagnoses Across U.S. Counties and Across Years

Man-Pui Sally Chan et al. AIDS Behav. 2018 Jul.

Abstract

The present study evaluated the potential use of Twitter data for providing risk indices of STIs. We developed online risk indices (ORIs) based on tweets to predict new HIV, gonorrhea, and chlamydia diagnoses, across U.S. counties and across 5 years. We analyzed over one hundred million tweets from 2009 to 2013 using open-vocabulary techniques and estimated the ORIs for a particular year by entering tweets from the same year into multiple semantic models (one for each year). The ORIs were moderately to strongly associated with the actual rates (.35 < rs < .68 for 93% of models), both nationwide and when applied to single states (California, Florida, and New York). Later models were slightly better than older ones at predicting gonorrhea and chlamydia, but not at predicting HIV. The proposed technique using free social media data provides signals of community health at a high temporal and spatial resolution.

Keywords: Big data; Chlamydia; Gonorrhea; HIV; Social media.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest: The authors declare that they have no conflict of interest.

Figures

Figure 1.
Figure 1.
Forest plots of correlations between the ORIs and actual rates at the county-level in 2009–2013 along with their 95% confidence intervals (error bars). Y-axes indicates which year of model was used to calculate the ORIs and x-axes indicates the correlation levels (Top panels for HIV, middle panels for chlamydia, and bottom panels for gonorrhea). Red dots refer to particularly small correlation coefficients whose confidence intervals do not overlap with the others; blue dots refer to larger correlation coefficients whose confidence intervals do not overlap with the others.
Figure 2.
Figure 2.
Procedures of geolocation, message grouping, and topic extraction.
Figure 3.
Figure 3.
Top two word clouds (with top 20 words) of the 2009 semantic model to compute the Online Risk Index of STIs in 2013.1 1 Terms starting with ‘name[CAPITAL LETTER]’ or ‘username[CAPITAL LETTER]’ here are used as substitutes for proper names (usually of celebrities) or usernames to anonymize that data. The capital letter denotes the first letter of the first name. The size of each word indicates the relative weight within a word cloud.
Figure 4a.
Figure 4a.
Maps of the actual rates (left column) and ORIs (right column) of HIV (top: California, middle: Florida, bottom: New York). Counties shown in white have missing data. Figure 4b. Maps of the actual rates (left column) and ORIs (right column) of chlamydia (top: California, middle: Florida, bottom: New York). Counties shown in white have missing data. Figure 4c. Maps of the actual rates (left column) and ORIs (right column) of gonorrhea (top: California, middle: Florida, bottom: New York). Counties shown in white have missing data.
Figure 4a.
Figure 4a.
Maps of the actual rates (left column) and ORIs (right column) of HIV (top: California, middle: Florida, bottom: New York). Counties shown in white have missing data. Figure 4b. Maps of the actual rates (left column) and ORIs (right column) of chlamydia (top: California, middle: Florida, bottom: New York). Counties shown in white have missing data. Figure 4c. Maps of the actual rates (left column) and ORIs (right column) of gonorrhea (top: California, middle: Florida, bottom: New York). Counties shown in white have missing data.
Figure 4a.
Figure 4a.
Maps of the actual rates (left column) and ORIs (right column) of HIV (top: California, middle: Florida, bottom: New York). Counties shown in white have missing data. Figure 4b. Maps of the actual rates (left column) and ORIs (right column) of chlamydia (top: California, middle: Florida, bottom: New York). Counties shown in white have missing data. Figure 4c. Maps of the actual rates (left column) and ORIs (right column) of gonorrhea (top: California, middle: Florida, bottom: New York). Counties shown in white have missing data.

Similar articles

Cited by

References

    1. Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2016. Atlanta, GA; 2017.
    1. Centers for Disease Control and Prevention. NCHHSTP AtlasPlus. https://www.cdc.gov/nchhstp/atlas/. Published 2017 Accessed May 25, 2017.
    1. Owusu-Edusei K, Chesson HW, Gift TL, et al. The estimated direct medical cost of selected sexually transmitted infections in the United States, 2008. Sex Transm Dis. 2013;40(3):197–201. doi:10.1097/OLQ.0b013e318285c6d2. - DOI - PubMed
    1. Himmelstein DU, Woolhandler S. Public health’s falling share of US health spending. Am J Public Health. 2016;106(1):56–57. doi:10.2105/AJPH.2015.302908. - DOI - PMC - PubMed
    1. Centers for Disease Control and Prevention. Overview of the CDC FY 2018 Budget Request; 2017. https://www.cdc.gov/budget/documents/fy2018/fy-2018-cdc-budgetoverview.pdf.

LinkOut - more resources