Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009;4(3):e4803.
doi: 10.1371/journal.pone.0004803. Epub 2009 Mar 11.

Clickstream data yields high-resolution maps of science

Affiliations

Clickstream data yields high-resolution maps of science

Johan Bollen et al. PLoS One. 2009.

Abstract

Background: Intricate maps of science have been created from citation data to visualize the structure of scientific activity. However, most scientific publications are now accessed online. Scholarly web portals record detailed log data at a scale that exceeds the number of all existing citations combined. Such log data is recorded immediately upon publication and keeps track of the sequences of user requests (clickstreams) that are issued by a variety of users across many different domains. Given these advantages of log datasets over citation data, we investigate whether they can produce high-resolution, more current maps of science.

Methodology: Over the course of 2007 and 2008, we collected nearly 1 billion user interactions recorded by the scholarly web portals of some of the most significant publishers, aggregators and institutional consortia. The resulting reference data set covers a significant part of world-wide use of scholarly web portals in 2006, and provides a balanced coverage of the humanities, social sciences, and natural sciences. A journal clickstream model, i.e. a first-order Markov chain, was extracted from the sequences of user interactions in the logs. The clickstream model was validated by comparing it to the Getty Research Institute's Architecture and Art Thesaurus. The resulting model was visualized as a journal network that outlines the relationships between various scientific domains and clarifies the connection of the social sciences and humanities to the natural sciences.

Conclusions: Maps of science resulting from large-scale clickstream data provide a detailed, contemporary view of scientific activity and correct the underrepresentation of the social sciences and humanities that is commonly found in citation data.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Matching JCR and Dewey Journal classifications to the AAT taxonomy.
Figure 2
Figure 2. The extraction of journal clickstream data from article level log data.
Usage log data consists of sequences of timed interaction events formula image. Interaction events issued by the same user from the same client can be grouped in user sessions. Each user session represents a clickstream formula image that can be expressed as a sequence of the articles that were part of the session's interaction events, i.e. formula image. Since every article is published in a journal, we can derive journal clickstreams, i.e. formula image. From the collection of all journal clickstreams we can calculate the probability formula image.
Figure 3
Figure 3. Distribution of edge weights in .
Figure 4
Figure 4. Summary of data processing leading to the map of science.
Figure 5
Figure 5. Map of science derived from clickstream data.
Circles represent individual journals. The lines that connect journals are the edges of the clickstream model in formula image. Colors correspond to the AAT classification of the journal. Labels have been assigned to local clusters of journals that correspond to particular scientific disciplines.
Figure 6
Figure 6. Cross-validating the map structure given by to journal relationships derived from AAT journal classifications, i.e. matrix .
Figure 7
Figure 7. Cross-validating the map of science's layout by retrieving each journal's top-level AAT classification (natural sciences vs. social sciences and humanities).
This map colors journals according to whether the AAT classifies them as either social sciences and humanities journals (yellow) vs. natural science journals (blue). Highly connected clusters corresponding to biology and psychology contain a mix of journals classified in either the social and natural sciences.

References

    1. Garfield E. Citation indexing for studying science. Nature. 1970;227:669–671. - PubMed
    1. Boyack KW, Wylie BN, Davidson GS. Domain visualization using VxInsight for science and technology management. J Am Soc Inf Sci Technol. 2002;53:764–774.
    1. Boyack KW, Klavans R, Boerner K. Mapping the backbone of science. Scientometrics. 2005;64:351–374.
    1. Leydesdorff L. The generation of aggregated journal-journal citation maps on the basis of the cd-rom version of the science citation index. Scientometrics. 1994;31
    1. de Moya-Anegón F, Vargas-Quesada B, Chinchilla-Rodríguez Z, Corera-Álvarez E, Munoz-Fernández FJ, et al. Visualizing the marrow of science. Journal of the American Society for Information Science and Technology. 2007;58

Publication types

MeSH terms