Using Application Programming Interfaces to Access Google Data for Health Research: Protocol for a Methodological Framework
- PMID: 32442159
- PMCID: PMC7381000
- DOI: 10.2196/16543
Using Application Programming Interfaces to Access Google Data for Health Research: Protocol for a Methodological Framework
Abstract
Background: Individuals are increasingly turning to search engines like Google to obtain health information and access resources. Analysis of Google search queries offers a novel approach, which is part of the methodological toolkit for infodemiology or infoveillance researchers, to understanding population health concerns and needs in real time or near-real time. While searches predominantly have been examined with the Google Trends website tool, newer application programming interfaces (APIs) are now available to academics to draw a richer landscape of searches. These APIs allow users to write code in languages like Python to retrieve sample data directly from Google servers.
Objective: The purpose of this paper is to describe a novel protocol to determine the top queries, volume of queries, and the top sites reached by a population searching on the web for a specific health term. The protocol retrieves Google search data obtained from three Google APIs: Google Trends, Google Health Trends (also referred to as Flu Trends), and Google Custom Search.
Methods: Our protocol consisted of four steps: (1) developing a master list of top search queries for an initial search term using Google Trends, (2) gathering information on relative search volume using Google Health Trends, (3) determining the most popular sites using Google Custom Search, and (4) calculating estimated total search volume. We tested the protocol following key procedures at each step and verified its usefulness by examining search traffic on birth control in 2017 in the United States. Two separate programmers working independently achieved similar results with insignificant variation due to sample variability.
Results: We successfully tested the methodology on the initial search term birth control. We identified top search queries for birth control, of which birth control pill was the most popular and obtained the relative and estimated total search volume for the top queries: relative search volume was 0.54 for the pill, corresponding to an estimated 9.3-10.7 million searches. We used the estimates of the proportion of search activity for the top queries to arrive at a generated list of the most popular websites: for the pill, the Planned Parenthood website was the top site.
Conclusions: The proposed methodological framework demonstrates how to retrieve Google query data from multiple Google APIs and provides thorough documentation required to systematically identify search queries and websites, as well as estimate relative and total search volume of queries in real time or near-real time in specific locations and time periods. Although the protocol needs further testing, it allows researchers to replicate the steps and shows promise in advancing our understanding of population-level health concerns.
International registered report identifier (irrid): RR1-10.2196/16543.
Keywords: APIs; Google; Google Trends; abortion; birth control; infodemic; infodemiology; infoveillance; reproductive health; search data.
©Anne Zepecki, Sylvia Guendelman, John DeNero, Ndola Prata. Originally published in JMIR Research Protocols (http://www.researchprotocols.org), 06.07.2020.
Conflict of interest statement
Conflicts of Interest: None declared.
Figures




Similar articles
-
Using Application Programming Interfaces (APIs) to Access Google Data and Gain Insights Into Searches on Birth Control in Louisiana and Mississippi, 2014-2018: Infoveillance Study.J Med Internet Res. 2021 Jul 12;23(7):e25923. doi: 10.2196/25923. J Med Internet Res. 2021. PMID: 34255662 Free PMC article.
-
Exploring Google Searches for Out-of-Clinic Medication Abortion in the United States During 2020: Infodemiology Approach Using Multiple Samples.JMIR Infodemiology. 2022 May 12;2(1):e33184. doi: 10.2196/33184. eCollection 2022 Jan-Jun. JMIR Infodemiology. 2022. PMID: 37113801 Free PMC article.
-
Shining the light on abortion: Drivers of online abortion searches across the United States in 2018.PLoS One. 2020 May 21;15(5):e0231672. doi: 10.1371/journal.pone.0231672. eCollection 2020. PLoS One. 2020. PMID: 32437369 Free PMC article.
-
Google Trends for health research: Its advantages, application, methodological considerations, and limitations in psychiatric and mental health infodemiology.Front Big Data. 2023 Mar 27;6:1132764. doi: 10.3389/fdata.2023.1132764. eCollection 2023. Front Big Data. 2023. PMID: 37050919 Free PMC article. Review.
-
Assessing the Methods, Tools, and Statistical Approaches in Google Trends Research: Systematic Review.J Med Internet Res. 2018 Nov 6;20(11):e270. doi: 10.2196/jmir.9366. J Med Internet Res. 2018. PMID: 30401664 Free PMC article.
Cited by
-
Orthopaedic Surgical Demand Index: A Measure of Need in the United States.J Am Acad Orthop Surg Glob Res Rev. 2022 Nov 15;6(11):e22.00131. doi: 10.5435/JAAOSGlobal-D-22-00131. eCollection 2022 Nov 1. J Am Acad Orthop Surg Glob Res Rev. 2022. PMID: 36733987 Free PMC article.
-
Using Application Programming Interfaces (APIs) to Access Google Data and Gain Insights Into Searches on Birth Control in Louisiana and Mississippi, 2014-2018: Infoveillance Study.J Med Internet Res. 2021 Jul 12;23(7):e25923. doi: 10.2196/25923. J Med Internet Res. 2021. PMID: 34255662 Free PMC article.
-
Exploring Google Searches for Out-of-Clinic Medication Abortion in the United States During 2020: Infodemiology Approach Using Multiple Samples.JMIR Infodemiology. 2022 May 12;2(1):e33184. doi: 10.2196/33184. eCollection 2022 Jan-Jun. JMIR Infodemiology. 2022. PMID: 37113801 Free PMC article.
-
Emerging Technology: Preparing Tomorrow's MCH Workforce to Innovate for Equity.Matern Child Health J. 2022 Aug;26(Suppl 1):210-215. doi: 10.1007/s10995-021-03371-5. Epub 2022 Jan 20. Matern Child Health J. 2022. PMID: 35060069 Free PMC article.
-
State health policies and interest in PrEP: evidence from Google Trends.AIDS Care. 2022 Mar;34(3):331-339. doi: 10.1080/09540121.2021.1934381. Epub 2021 Jun 30. AIDS Care. 2022. PMID: 34191662 Free PMC article.
References
-
- Purcell K, Brenner J, Rainie L. Search Engine Use 2012. Washington, DC: Pew Internet & American Life Project; 2012. Mar 09, [2019-08-15]. https://www.pewinternet.org/wp-content/uploads/sites/9/media/Files/Repor....
-
- Stephens-Davidowitz S. Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are. New York, NY: HarperCollins Publishers; 2017.
-
- Google Trends. [2020-01-27]. https://trends.google.com/trends/
-
- Klembczyk JJ, Jalalpour M, Levin S, Washington RE, Pines JM, Rothman RE, Dugas AF. Google Flu Trends spatial variability validated against emergency department influenza-related visits. J Med Internet Res. 2016 Jun 28;18(6):e175. doi: 10.2196/jmir.5585. https://www.jmir.org/2016/6/e175/ - DOI - PMC - PubMed
LinkOut - more resources
Full Text Sources