Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003:2003:225-9.

What is the prevalence of health-related searches on the World Wide Web? Qualitative and quantitative analysis of search engine queries on the internet

Affiliations

What is the prevalence of health-related searches on the World Wide Web? Qualitative and quantitative analysis of search engine queries on the internet

G Eysenbach et al. AMIA Annu Symp Proc. 2003.

Abstract

While health information is often said to be the most sought after information on the web, empirical data on the actual frequency of health-related searches on the web are missing. In the present study we aimed to determine the prevalence of health-related searches on the web by analyzing search terms entered by people into popular search engines. We also made some preliminary attempts in qualitatively describing and classifying these searches. Occasional difficulties in determining what constitutes a "health-related" search led us to propose and validate a simple method to automatically classify a search string as "health-related". This method is based on determining the proportion of pages on the web containing the search string and the word "health", as a proportion of the total number of pages with the search string alone. Using human codings as gold standard we plotted a ROC curve and determined empirically that if this "co-occurance rate" is larger than 35%, the search string can be said to be health-related (sensitivity: 85.2%, specificity 80.4%). The results of our "human" codings of search queries determined that about 4.5% of all searches are "health-related". We estimate that globally a minimum of 6.75 Million health-related searches are being conducted on the web every day, which is roughly the same number of searches that have been conducted on the NLM Medlars system in 1996 in a full year.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distribution of “co-occurrence rates” of search terms from metacrawler. The higher the rate, the higher the proportion of pages where the search query and the word health occur together and presumably the more the search query is related to health. The co-occurrence rate can be seen as a “health-relatedness index”.
Figure 2
Figure 2
ROC (receiver operating characteristics) curve
Figure 3
Figure 3
Precision-recall curve

References

    1. Pew Internet and American Life Project. The Online Health Care Revolution: How the Web helps Americans take better care of themselves. 11-26-2000.
    1. Stavri PZ. Personal health information-seeking: a qualitative review of the literature. Medinfo. 2001;10:1484–8. - PubMed
    1. Forsythe DE, Buchanan BG, Osheroff JA, Miller RA. Expanding the concept of medical information: an observational study of physicians' information needs. Comput Biomed Res. 1992;25:181–200. - PubMed
    1. Eysenbach G. Consumer health informatics. BMJ. 2000;320:1713–6. - PMC - PubMed
    1. Houston TK, Chang BL, Brown S, Kukafka R. Consumer health informatics: a consensus description and commentary from American Medical Informatics Association members. Proc AMIA Symp. 2001:269–73. - PMC - PubMed

LinkOut - more resources