Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 20;7(1):408.
doi: 10.1038/s41597-020-00749-y.

A detailed open access model of the PubMed literature

Affiliations

A detailed open access model of the PubMed literature

Kevin W Boyack et al. Sci Data. .

Abstract

Portfolio analysis is a fundamental practice of organizational leadership and is a necessary precursor of strategic planning. Successful application requires a highly detailed model of research options. We have constructed a model, the first of its kind, that accurately characterizes these options for the biomedical literature. The model comprises over 18 million PubMed documents from 1996-2019. Document relatedness was measured using a hybrid citation analysis + text similarity approach. The resulting 606.6 million document-to-document links were used to create 28,743 document clusters and an associated visual map. Clusters are characterized using metadata (e.g., phrases, MeSH) and over 20 indicators (e.g., funding, patent activity). The map and cluster-level data are embedded in Tableau to provide an interactive model enabling in-depth exploration of a research portfolio. Two example usage cases are provided, one to identify specific research opportunities related to coronavirus, and the second to identify research strengths of a large cohort of African American and Native American researchers at the University of Michigan Medical School.

PubMed Disclaimer

Conflict of interest statement

Two of the authors (K.W.B. and R.K.) are employed by a small company that received the award mentioned above under which this work was funded. The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Data and process used to create the PubMed model and associated tools.
Fig. 2
Fig. 2
Visual map of the PubMed model showing 28,743 clusters. Each cluster is colored according to its dominant field (see legend).
Fig. 3
Fig. 3
Detailed characterization of a single cluster in the Excel workbook.
Fig. 4
Fig. 4
Tableau views of the PubMed model filtered to show only those clusters with UMMS papers. Color reflects the research level of each cluster. (a) Map view. (b) Scatterplot view with the approximate potential to translate percentile on the x-axis and NIH/NSF funding percentile on the y-axis.
Fig. 5
Fig. 5
Tableau views of subsets of clusters related to coronavirus. (a) Map view of clusters with at least 25 CORD-19 documents and a CORD-19 document concentration of at least 10%. (b) Scatterplot view of clusters further filtered to those containing UMMS papers.
Fig. 6
Fig. 6
Publication profile of African American and Native American principal investigators at UMMS overlaid on the PubMed map. Sizes of colored circles reflect numbers of publications.

References

    1. Klavans R, Boyack KW. Research portfolio analysis and topic prominence. Journal of Informetrics. 2017;11:1158–1174. doi: 10.1016/j.joi.2017.10.002. - DOI
    1. Klavans R, Boyack KW. Which type of citation analysis generates the most accurate taxonomy of scientific and technical knowledge? Journal of the Association for Information Science and Technology. 2017;68:984–998. doi: 10.1002/asi.23734. - DOI
    1. Ahlgren P, Chen Y, Colliander C, van Eck NJ. Enhancing direct citations: A comparison of relatedness measures for community detection in a large set of PubMed publications. Quantitative Science Studies. 2020;1:714–729. doi: 10.1162/qss_a_00027. - DOI
    1. Waltman L, Boyack KW, Colavizza G, Van Eck NJ. A principled methodology for comparing relatedness measures for clustering publications. Quantitative Science Studies. 2020;1:691–713. doi: 10.1162/qss_a_00035. - DOI
    1. Baas J, Schotten M, Plume A, Côté G, Karimi R. Scopus as a curated, high-quality bibliometric data source for academic research in quantitative science studies. Quantitative Science Studies. 2020;1:377–386. doi: 10.1162/qss_a_00019. - DOI

Publication types