Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Oct 12;107(41):17486-90.
doi: 10.1073/pnas.1005962107. Epub 2010 Sep 27.

Predicting consumer behavior with Web search

Affiliations

Predicting consumer behavior with Web search

Sharad Goel et al. Proc Natl Acad Sci U S A. .

Abstract

Recent work has demonstrated that Web search volume can "predict the present," meaning that it can be used to accurately track outcomes such as unemployment levels, auto and home sales, and disease prevalence in near real time. Here we show that what consumers are searching for online can also predict their collective future behavior days or even weeks in advance. Specifically we use search query volume to forecast the opening weekend box-office revenue for feature films, first-month sales of video games, and the rank of songs on the Billboard Hot 100 chart, finding in all cases that search counts are highly predictive of future outcomes. We also find that search counts generally boost the performance of baseline models fit on other publicly available data, where the boost varies from modest to dramatic, depending on the application in question. Finally, we reexamine previous work on tracking flu trends and show that, perhaps surprisingly, the utility of search data relative to a simple autoregressive model is modest. We conclude that in the absence of other data sources, or where small improvements in predictive performance are material, search queries provide a useful guide to the near future.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Search volume for the movie Transformers 2 (A) and the video game Tom Clancy’s H.A.W.X. (B) prior to and after their release, and search and Billboard rank for the song “Right Round” by Flo Rida (C).
Fig. 2.
Fig. 2.
Search-based predictions for box-office movie revenue (A), first-month video game sales (B), and the Billboard rank of songs (C), where predictions are made immediately prior to the event of interest; correlation between predicted and actual outcomes when predictions are based on query data t weeks prior to the event (DF).
Fig. 3.
Fig. 3.
Predictions from the baseline (AC) and the combined baseline-plus-search models (DF) for movies, video games, and music.
Fig. 4.
Fig. 4.
Actual and estimated flu levels in the United States, where flu level is the percentage of physician visits that involve patients with influenza-like illnesses. Search-based estimates are from Google Flu Trends.
Fig. 5.
Fig. 5.
The correlation between predicted and actual outcomes for movies, video game sequels and nonsequels, music, and flu.

Similar articles

Cited by

References

    1. Gruhl D, Guha R, Kumar R, Novak J, Tomkins A. The predictive power of online chatter; Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining; New York: ACM; 2005. pp. 78–87.
    1. Asur S, Huberman B. Predicting the Future with Social Media. 2010. arXiv:1003.5699.
    1. Choi H, Varian H. Predicting initial claims for unemployment benefits. 2009. Available at http://research.google.com/archive/papers/initialclaimsUS.pdf.
    1. Choi H, Varian H. Predicting the present with Google Trends. 2009. Available at http://google.com/googleblogs/pdfs/google_predicting_the_present.pdf.
    1. Ettredge M, Gerdes J, Karuga G. Using web-based search data to predict macroeconomic statistics. Commun ACM. 2005;48:87–92.

LinkOut - more resources