Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Nov 19;12(1):1840.
doi: 10.1038/s41597-025-06136-9.

PROTAC-PatentDB: A PROTAC Patent Compound Dataset

Affiliations

PROTAC-PatentDB: A PROTAC Patent Compound Dataset

Hong Cai et al. Sci Data. .

Abstract

Proteolysis-targeting chimeras (PROTAC) are emerging and promising molecules for targeted protein degradation which have the potential to overcome critical bottlenecks in traditional small molecule drug development. However, the scarcity of publicly available data on molecular compound structures has significantly hindered computational drug discovery and AI-aided drug discovery/design (AIDD) in this field. Patents are an important but underutilized source of novel chemical structures in medicinal chemistry. In this study, we collected PROTAC patents published in 2013-2023 and the associated chemical structures disclosed therein. Through manual screening and expert curation, we identified 63,136 unique PROTAC compounds under 590 patent families, along with 252 targets. Additionally, we employed the ADMETlab 3.0 platform to predict 120 physicochemical properties for all compounds. The dataset is publicly available on the Figshare platform, and an online webserver ( http://protacpatentdb.com ) has also been established. Given the rapid growth of PROTAC patent literature, this dataset can be further expanded as new patents are continuously published.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Flowchart for data collection and processing.
Fig. 2
Fig. 2
Overview of PROTAC patents. (a) annual trends in PROTAC patent applications, represented by patent documents and patent families, (b) geographic distribution of PROTAC patent documents by publication authority, (c) top 15 patent holders ranked by number of patent families, and (d) top 15 molecular targets ranked by number of patent families.
Fig. 3
Fig. 3
Distribution of specific molecular targets among 63,136 PROTAC compounds.
Fig. 4
Fig. 4
Workflow of case studies.

References

    1. Li, X. & Song, Y. C. Proteolysis-targeting chimera (PROTAC) for targeted protein degradation and cancer therapy. J Hematol Oncol13, 50, 10.1186/s13045-020-00885-3 (2020). - PMC - PubMed
    1. Békés, M., Langley, D. R. & Crews, C. M. PROTAC targeted protein degraders: the past is prologue. Nat Rev Drug Discov21, 181–200, 10.1038/s41573-021-00371-6 (2022). - PMC - PubMed
    1. Ge, J., Hsieh, C. Y., Fang, M., Sun, H. Y. & Hou, T. Development of PROTACs using computational approaches. Trends Pharmacol Sci45, 1162–1174, 10.1016/j.tips.2024.10.006 (2024). - PubMed
    1. Gharbi, Y. & Mercado, R. A comprehensive review of emerging approaches in machine learning for de novo PROTAC design. Digital Discovery3, 2158–2176, 10.1039/D4DD00177J (2024).
    1. Tan, S. Y., Chen, Z. L., Lu, R. Q., Liu, H. X. & Yao, X. J. Rational Proteolysis Targeting Chimera Design Driven by Molecular Modeling and Machine Learning. WIREs: Computational Molecular Science15, 10.1002/wcms.70013 (2025).

LinkOut - more resources