Lifestyle Disease Surveillance Using Population Search Behavior: Feasibility Study
- PMID: 32012050
- PMCID: PMC7011125
- DOI: 10.2196/13347
Lifestyle Disease Surveillance Using Population Search Behavior: Feasibility Study
Abstract
Background: As the process of producing official health statistics for lifestyle diseases is slow, researchers have explored using Web search data as a proxy for lifestyle disease surveillance. Existing studies, however, are prone to at least one of the following issues: ad-hoc keyword selection, overfitting, insufficient predictive evaluation, lack of generalization, and failure to compare against trivial baselines.
Objective: The aims of this study were to (1) employ a corrective approach improving previous methods; (2) study the key limitations in using Google Trends for lifestyle disease surveillance; and (3) test the generalizability of our methodology to other countries beyond the United States.
Methods: For each of the target variables (diabetes, obesity, and exercise), prevalence rates were collected. After a rigorous keyword selection process, data from Google Trends were collected. These data were denormalized to form spatio-temporal indices. L1-regularized regression models were trained to predict prevalence rates from denormalized Google Trends indices. Models were tested on a held-out set and compared against baselines from the literature as well as a trivial last year equals this year baseline. A similar analysis was done using a multivariate spatio-temporal model where the previous year's prevalence was included as a covariate. This model was modified to create a time-lagged regression analysis framework. Finally, a hierarchical time-lagged multivariate spatio-temporal model was created to account for subnational trends in the data. The model trained on US data was, then, applied in a transfer learning framework to Canada.
Results: In the US context, our proposed models beat the performances of the prior work, as well as the trivial baselines. In terms of the mean absolute error (MAE), the best of our proposed models yields 24% improvement (0.72-0.55; P<.001) for diabetes; 18% improvement (1.20-0.99; P=.001) for obesity, and 34% improvement (2.89-1.95; P<.001) for exercise. Our proposed across-country transfer learning framework also shows promising results with an average Spearman and Pearson correlation of 0.70 for diabetes and 0.90 and 0.91 for obesity, respectively.
Conclusions: Although our proposed models beat the baselines, we find the modeling of lifestyle diseases to be a challenging problem, one that requires an abundance of data as well as creative modeling strategies. In doing so, this study shows a low-to-moderate validity of Google Trends in the context of lifestyle disease surveillance, even when applying novel corrective approaches, including a proposed denormalization scheme. We envision qualitative analyses to be a more practical use of Google Trends in the context of lifestyle disease surveillance. For the quantitative analyses, the highest utility of using Google Trends is in the context of transfer learning where low-resource countries could benefit from high-resource countries by using proxy models.
Keywords: Google Trends; Web search; digital epidemiology; infodemiology; infoveillance; lifestyle disease surveillance; noncommunicable diseases; nowcasting; public health.
©Shahan Ali Memon, Saquib Razak, Ingmar Weber. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 27.01.2020.
Conflict of interest statement
Conflicts of Interest: None declared.
Figures


Similar articles
-
Forecasting the COVID-19 Epidemic by Integrating Symptom Search Behavior Into Predictive Models: Infoveillance Study.J Med Internet Res. 2021 Aug 11;23(8):e28876. doi: 10.2196/28876. J Med Internet Res. 2021. PMID: 34156966 Free PMC article.
-
Correlation between Google Trends on dengue fever and national surveillance report in Indonesia.Glob Health Action. 2019;12(1):1552652. doi: 10.1080/16549716.2018.1552652. Glob Health Action. 2019. PMID: 31154985 Free PMC article.
-
Assessment and statistical modeling of the relationship between remotely sensed aerosol optical depth and PM2.5 in the eastern United States.Res Rep Health Eff Inst. 2012 May;(167):5-83; discussion 85-91. Res Rep Health Eff Inst. 2012. PMID: 22838153
-
Google trends: a web-based tool for real-time surveillance of disease outbreaks.Clin Infect Dis. 2009 Nov 15;49(10):1557-64. doi: 10.1086/630200. Clin Infect Dis. 2009. PMID: 19845471 Review.
-
Assessing the Methods, Tools, and Statistical Approaches in Google Trends Research: Systematic Review.J Med Internet Res. 2018 Nov 6;20(11):e270. doi: 10.2196/jmir.9366. J Med Internet Res. 2018. PMID: 30401664 Free PMC article.
Cited by
-
Periodic Trends in Internet Searches for Ocular Symptoms in the US.Ophthalmic Epidemiol. 2023 Aug;30(4):352-357. doi: 10.1080/09286586.2022.2119260. Epub 2022 Sep 14. Ophthalmic Epidemiol. 2023. PMID: 36103713 Free PMC article.
-
Explanation of hand, foot, and mouth disease cases in Japan using Google Trends before and during the COVID-19: infodemiology study.BMC Infect Dis. 2022 Oct 29;22(1):806. doi: 10.1186/s12879-022-07790-9. BMC Infect Dis. 2022. PMID: 36309663 Free PMC article.
-
Using Application Programming Interfaces to Access Google Data for Health Research: Protocol for a Methodological Framework.JMIR Res Protoc. 2020 Jul 6;9(7):e16543. doi: 10.2196/16543. JMIR Res Protoc. 2020. PMID: 32442159 Free PMC article.
References
-
- Granka L. Using online search traffic to predict US presidential elections. PS Polit Sci Polit. 2013;46(2):271–9. doi: 10.1017/s1049096513000292. - DOI
-
- Ojala J, Zagheni E, Billari FC, Weber I. Fertility and its Meaning: Evidence from Search Behavior. Proceedings of the Eleventh International AAAI Conference on Web and Social Media; ICWSM'17; May 15-18, 2017; Montréal, Québec, Canada. 2017. pp. 640–3.
MeSH terms
LinkOut - more resources
Full Text Sources
Research Materials