Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Mar-Apr;14(2):212-20.
doi: 10.1197/jamia.M2191. Epub 2007 Jan 9.

A day in the life of PubMed: analysis of a typical day's query log

Affiliations

A day in the life of PubMed: analysis of a typical day's query log

Jorge R Herskovic et al. J Am Med Inform Assoc. 2007 Mar-Apr.

Abstract

Objective: To characterize PubMed usage over a typical day and compare it to previous studies of user behavior on Web search engines.

Design: We performed a lexical and semantic analysis of 2,689,166 queries issued on PubMed over 24 consecutive hours on a typical day.

Measurements: We measured the number of queries, number of distinct users, queries per user, terms per query, common terms, Boolean operator use, common phrases, result set size, MeSH categories, used semantic measurements to group queries into sessions, and studied the addition and removal of terms from consecutive queries to gauge search strategies.

Results: The size of the result sets from a sample of queries showed a bimodal distribution, with peaks at approximately 3 and 100 results, suggesting that a large group of queries was tightly focused and another was broad. Like Web search engine sessions, most PubMed sessions consisted of a single query. However, PubMed queries contained more terms.

Conclusion: PubMed's usage profile should be considered when educating users, building user interfaces, and developing future biomedical information retrieval systems.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Log file sample. The PubMed log had three columns. The first column contains a PubMed-generated user hash, the second a timestamp (in seconds since midnight at the server’s location), and the third is the actual query as submitted. These are a few consecutive lines from the raw log.
Figure 2
Figure 2
Algorithm to classify queries as informational vs. navigational.
Figure 3
Figure 3
Semantic distance determination. In this example, we walk through the tree to determine the semantic distance between “Myocardial infarction” and “Cerebrovascular accident,” which is six steps long (only one of the possible paths is shown).
Figure 4
Figure 4
Histogram of queries issued per user, for all 2,689,166 queries.
Figure 5
Figure 5
Logarithm of the size of the result set for a sample of 2,272 queries.
Figure 6
Figure 6
Relative frequency of term counts for 2,689,166 PubMed queries issued during a single day (graph truncated at 20 terms).
Figure 7
Figure 7
Number of sessions per user for 2,689,166 queries issued on a single day.
Figure 8
Figure 8
Number of queries per session for 2,689,166 queries issued in a single day, as a proportion of sessions with the specified number of queries. Figure truncated at 20 queries.

References

    1. United States National Library of Medicine. Resource statistics. [Web page] c2005 Available at: http://www.ncbi.nih.gov/About/tools/restable_stat_pubmeddata.htm. Accessed January 17, 2006.
    1. Silverstein C, Henzinger M, Marais H, Moricz M. Analysis of a very large AltaVista query log. Technical Note: Digital Equipment Corporation; 1998 October 26. Report No.: SRC Technical Note 1998-014.
    1. Spink A, Wolfram D, Jansen B, Saracevic T. Searching the web: the public and their queries J Am Soc Inf Sci Technol 2001;52:226-234.
    1. Jansen BJ, Spink A, Saracevic T. Real life, real users, and real needs: a study and analysis of user queries on the web Inf Process Manage 2000;36:207-227.
    1. Eiron N, McCurley KS. Untangling compound documents on the webIn: Ashman H, editor. Conference on Hypertext and Hypermedia; 2003 August 26–30. Nottingham, England: ACM Press; 2003. pp. 85-94.

Publication types

MeSH terms