Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 May 29;8(5):e64417.
doi: 10.1371/journal.pone.0064417. Print 2013.

The geography of happiness: connecting twitter sentiment and expression, demographics, and objective characteristics of place

Affiliations

The geography of happiness: connecting twitter sentiment and expression, demographics, and objective characteristics of place

Lewis Mitchell et al. PLoS One. .

Abstract

We conduct a detailed investigation of correlations between real-time expressions of individuals made across the United States and a wide range of emotional, geographic, demographic, and health characteristics. We do so by combining (1) a massive, geo-tagged data set comprising over 80 million words generated in 2011 on the social network service Twitter and (2) annually-surveyed characteristics of all 50 states and close to 400 urban populations. Among many results, we generate taxonomies of states and cities based on their similarities in word use; estimate the happiness levels of states and cities; correlate highly-resolved demographic characteristics with happiness levels; and connect word choice and message length with urban characteristics such as education levels and obesity rates. Our results show how social media may potentially be used to estimate real-time levels and changes in population-scale measures such as obesity rates.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Average word happiness for geotagged tweets in all US states collected during calendar year 2011.
The happiest 5 states, in order, are: Hawaii, Maine, Nevada, Utah and Vermont. The saddest 5 states, in order, are: Louisiana, Mississippi, Maryland, Delaware and Georgia. Word shift plots describing how differences in word usage contribute to variation in happiness between states are presented in Appendix B in Appendix S1 (online) .
Figure 2
Figure 2. Scatter plot matrix of correlations between different well-being measures.
Points are colored by p-value, statistically insignificant correlations above formula image are shown in red. Spearman’s r and p-value are reported in the inset.
Figure 3
Figure 3. Clustergram showing cross-correlations between word frequency distributions for all states in 2011.
Red signifies states with similar or highly-correlating word frequency distributions, while blue signifies states with relatively dissimilar word frequency distributions.
Figure 4
Figure 4. Map of tweets collected from New York City during the calendar year 2011.
Each point represents an individual tweet and is colored by the average word happiness formula image of nearby tweets: red is happier, blue is sadder. For a point to be colored, we require that there be at least 200 LabMT words within a 500 meter radius of the location; points which do not satisfy this criterion are colored black. Maps for all other cities can be found in Appendix C in Appendix S1 (online) .
Figure 5
Figure 5. Map showing happiness of all tweets collected from the lower 48 US states during 2011.
Points are colored as in figure 4, except we now require that there are at least 500 LabMT words within a 10 kilometer radius of the location of each tweet in order to be colored.
Figure 6
Figure 6. Distribution of average happiness values for all 373 cities in the census data set.
A vertical dashed line denotes the average for all cities. Note the greater weight towards the right of the distribution, with more cities having happiness scores higher than the average.
Figure 7
Figure 7. Happiness as a function of number of tweets per capita.
Areas with a higher density of tweets per capita tend to be less happy.
Figure 8
Figure 8. The 15 highest average word happiness scores for cities in the contiguous USA.
Scores were calculated using (1) and the LabMT word list. The full list of cities can be found in Appendix C in Appendix S1 (online) .
Figure 9
Figure 9. The 15 lowest average word happiness scores for cities in the contiguous USA.
Scores were calculated using (1) and the LabMT word list. The full list of cities can be found in Appendix C in Appendix S1 (online) .
Figure 10
Figure 10. Word shift graphs for the happiest city and saddest city.
These show how formula image varies for all US cities considered versus the cities Napa, California (left) and Beaumont, Texas (right), having the highest and lowest formula image respectively. Words are ranked in order of decreasing percentage contribution to the overall average happiness difference formula image. The symbols formula image indicate whether a word is relatively happy or sad compared to formula image for the entire US (text formula image), while the arrows formula image indicate whether the word was used more or less in the text formula image for each city than in formula image. The left inset panel shows how the ranked LabMT words combine in sum. The four circles at bottom right show the total contribution of the four kinds of words (formula image, formula image, formula image, formula image). Relative text size is indicated by the areas of the gray squares.
Figure 11
Figure 11. Spearman correlations for 432 demographic attributes with happiness.
The 8 groupings along the horizontal axis are for covarying attributes identified by agglomerative hierarchical clustering, independently of happiness. Crosses lie on the median of each cluster, and the dashed lines represent the 1% significance level. The two clusters which have medians that correlate significantly with happiness are colored blue. A complete list of the correlation of all attributes with happiness can be found in Appendix D in Appendix S1 (online) .
Figure 12
Figure 12. Correlation between education and use of the word ‘café’.
The scatter plot shows the correlation between rate of occurrence of the word ‘café’ and percentage of population with a bachelor’s degree or higher in US cities during the calendar year 2011. The red line shows linear correlation while the reported r and p-values show the Spearman correlation.
Figure 13
Figure 13. Correlation between happiness and obesity.
The scatter plot shows the correlation between formula image and obesity level, as taken from the 2011 Gallup and Healthways survey. The red line is the straight line of best fit to the data, while the r value is the Spearman correlation coefficient for the data.
Figure 14
Figure 14. Cross-correlations between word frequency distributions for 40 cities.
The clustergram shows Cross-correlations between word frequency distributions for the 40 cities with highest word counts in 2011. Red signifies cities with similar word frequency distribution, while blue signifies cities with dissimilar word frequency distributions.

References

    1. Bettencourt LMA, Lobo J, Helbing D, Kuhnert C, West GB (2007) Growth, innovation, scaling, and the pace of life in cities. Proceedings of the National Academy of Sciences 104: 7301–7306. - PMC - PubMed
    1. Jacobs J (1961) The Death and Life of Great American Cities. New York: Vintage Books, 458 p.
    1. Sachs JD, Layard R, Helliwell JF (2012) World Happiness Report. Technical report, Columbia University/Canadian Institute for Advanced Research/London School of Economics.
    1. Gallup-Healthways (2012) State of well-being 2011: City, state and congressional district wellbeing reports. Technical report, Gallup Inc. Available: http://www.well-beingindex.com/files/2011CompositeReport.pdf.
    1. Gallup-HealthwaysWell-Being Index. Available: http://www.well-beingindex.com/. Accessed February 2013.

Publication types