Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 17;12(1):1018.
doi: 10.1038/s41597-025-05343-8.

PubMed knowledge graph 2.0: Connecting papers, patents, and clinical trials in biomedical science

Affiliations

PubMed knowledge graph 2.0: Connecting papers, patents, and clinical trials in biomedical science

Jian Xu et al. Sci Data. .

Abstract

Papers, patents, and clinical trials are essential scientific resources in biomedicine, crucial for knowledge sharing and dissemination. However, these documents are often stored in disparate databases with varying management standards and data formats, making it challenging to form systematic and fine-grained connections among them. To address this issue, we construct PKG 2.0, a comprehensive knowledge graph dataset encompassing over 36 million papers, 1.3 million patents, and 0.48 million clinical trials in the biomedical field. PKG 2.0 integrates these dispersed resources through 482 million biomedical entity linkages, 19 million citation linkages, and 7 million project linkages. The construction of PKG 2.0 wove together fine-grained biomedical entity extraction, high-performance author name disambiguation, multi-source citation integration, and high-quality project data from the NIH Exporter. Data validation demonstrates that PKG 2.0 excels in key tasks such as author disambiguation and biomedical entity recognition. This dataset provides valuable resources for biomedical researchers, bibliometric scholars, and those engaged in literature mining.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Framework of PKG 2.0. (A) Three ways in which the three types of academic literature link to each other. (B) The specific methods or data sources of the linkage. (C) Supplementary and extended works in PKG 2.0.
Fig. 2
Fig. 2
The entity relationship diagram of PKG. PKG includes “Articles”, “ClinicalTrials”, and “Patents” as the main data tables, and links them to each other through a series of relationship tables, as well as to other tables capturing data from a range of sources. For clarity, here we show a subset and their main fields of the tables (see the Data Records section and the appendix for a more comprehensive view of the tables).
Fig. 3
Fig. 3
The process of the author name disambiguation. It mainly combines the author disambiguation results of the Author-ity dataset and Semantic Scholar.
Fig. 4
Fig. 4
Evaluation of author name disambiguation results for different feature groups. The Base group contains five commonly used features in the author name disambiguation task as a baseline; the Patent-based group and Trial-based group, respectively, incorporate features related to patents and clinical trial studies, which improve author disambiguation performance; after adding all features, the performance of the Comprehensive group reaches a relatively optimal level.
Fig. 5
Fig. 5
Timeline of COVID-19 Clinical Trials by BioNTech. It showcases the earliest ten COVID-19 clinical trials sponsored by BioNTech. For each trial, a horizontal line represents its duration, stretching from the start date to the completion date. Along these lines, marker points are placed to denote the publication times of relevant papers and patents.
Fig. 6
Fig. 6
The patent, paper, and clinical trial related to a certain COVID-19 vaccine. The internal triangle in the figure shows the linkages among the three types of literature in the PKG, and the outer one shows a specific instance of the COVID-19 vaccine. There are many related papers, clinical trials, and patents about the vaccine, and the figure only demonstrates a certain one of each type of literature. For example, a series of related papers from different time periods can be found through citations in clinical trials NCT04283461, including early reports focusing on vaccine safety and efficacy in various populations (PMID33053279/33301246/33524990), mid-term studies on vaccine effectiveness and adverse events (PMID35739094/35792746/36055877), and more recent reviews and exploratory studies (PMID38012751/35472295).

Similar articles

References

    1. Fortunato, S. et al. Science of science. Science359(6379), eaao0185 (2018). - PMC - PubMed
    1. Van Noorden, R., Maher, B. & Nuzzo, R. The top 100 papers. Nature News514(7524), 550 (2014). - PubMed
    1. Rowland, F. The peer‐review process. Learned publishing15(4), 247–258 (2002).
    1. Shatz, D. Peer review: A critical inquiry. Rowman & Littlefield. (2004).
    1. Reinhart, M. & Schendzielorz, C. Peer-review procedures as practice, decision, and governance—the road to theories of peer review. Science and Public Policy51(3), 543–552 (2024).

LinkOut - more resources