Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 2:11:e15153.
doi: 10.7717/peerj.15153. eCollection 2023.

Illuminating the druggable genome through patent bioactivity data

Affiliations

Illuminating the druggable genome through patent bioactivity data

Maria P Magariños et al. PeerJ. .

Abstract

The patent literature is a potentially valuable source of bioactivity data. In this article we describe a process to prioritise 3.7 million life science relevant patents obtained from the SureChEMBL database (https://www.surechembl.org/), according to how likely they were to contain bioactivity data for potent small molecules on less-studied targets, based on the classification developed by the Illuminating the Druggable Genome (IDG) project. The overall goal was to select a smaller number of patents that could be manually curated and incorporated into the ChEMBL database. Using relatively simple annotation and filtering pipelines, we have been able to identify a substantial number of patents containing quantitative bioactivity data for understudied targets that had not previously been reported in the peer-reviewed medicinal chemistry literature. We quantify the added value of such methods in terms of the numbers of targets that are so identified, and provide some specific illustrative examples. Our work underlines the potential value in searching the patent corpus in addition to the more traditional peer-reviewed literature. The small molecules found in these patents, together with their measured activity against the targets, are now accessible via the ChEMBL database.

Keywords: Bioactive compounds; Drug targets; Druggable genome; Patents; Small molecules; Understudied targets.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1. Patent group classification.
(A) A total of 3.7 million patents were scanned for the presence of (1) tables with bioactivity keywords and (2) understudied target names mentioned in the context of specific phrases in the titles, abstracts, description and claims sections of the patents. According to the result of each of these two independent processes, the patents could be classified in the following groups: Group 1: patents with bioactivity tables, and targets mentioned in titles or abstracts; Group 2: patents with bioactivity tables, and targets mentioned in descriptions or claims sections; Group 3: patents without bioactivity tables, and targets mentioned in titles or abstracts; Group 4: patents without bioactivity tables, and targets mentioned in descriptions and claims; Group 5: patents with bioactivity tables but no targets; Group 6: patents without bioactivity tables and without targets. (B) A subset of patent families in each group was manually examined. Total: total number of patent families that belonged to each group; Read: number of patent families of each group that were read to determine the presence of relevant data; Positive: number of patent families read that had bioactivity data of small molecules against at least one understudied target; Negative: number of patent families read that did not have bioactivity data of small molecules on understudied targets.
Figure 2
Figure 2. Illustrative examples of bioactive compounds identified from SureChEMBL patent workflow against three targets.
Figure 3
Figure 3. Numbers of relevant patents and scientific literature publications for GPR6 per year.
Key compound disclosures in patents and scientific literature indicated by dashed lines: (A) example of a compound reported in one of the earliest patents with data against GPR6. (B) Example of a compound in one of the earliest patents identified by our method. (C) First small molecule modulator reported in scientific literature.

References

    1. Akhondi SA, Klenner AG, Tyrchan C, Manchala AK, Boppana K, Lowe D, Zimmermann M, Jagarlapudi SA, Sayle R, Kors JA, Muresan S. Annotated chemical patent corpus: a gold standard for text mining. PLOS ONE. 2014;9(9):e107477. doi: 10.1371/journal.pone.0107477. - DOI - PMC - PubMed
    1. Alexander SP, Battey J, Benson HE, Benya RV, Bonner TI, Davenport AP, Eguchi S, Harmar A, Holliday N, Jensen RT, Karnik S, Kostenis E, Liew WC, Monaghan AE, Mpamhanga C, Neubig R, Pawson AJ, Pin J-P, Sharman JL, Spedding M, Spindel E, Stoddart L, Storjohann L, Thomas WG, Tirupula K, Vanderheyden P. Class A orphans (version 2019.5) in the IUPHAR/BPS guide to pharmacology database. IUPHAR/BPS Guide to Pharmacology CITE. 2019;2019(5) doi: 10.2218/gtopdb/F16/2019.5. - DOI
    1. Armstrong JF, Faccenda E, Harding SD, Pawson AJ, Southan C, Sharman JL, Campo B, Cavanagh DR, Alexander SPH, Davenport AP, Spedding M, Davies JA, Nc I. The IUPHAR/BPS guide to PHARMACOLOGY in 2020: extending immunopharmacology content and introducing the IUPHAR/MMV guide to MALARIA PHARMACOLOGY. Nucleic Acids Research. 2020;48(Suppl. 1):D1006–D1021. doi: 10.1093/nar/gkz951. - DOI - PMC - PubMed
    1. Ashenden SK, Kogej T, Engkvist O, Bender A. Innovation in small-molecule-druggable chemical space: where are the initial modulators of new targets published? Journal of Chemical Information and Modeling. 2017;57(11):2741–2753. doi: 10.1021/acs.jcim.7b00295. - DOI - PubMed
    1. Avram S, Bologa CG, Holmes J, Bocci G, Wilson TB, Nguyen DT, Curpan R, Halip L, Bora A, Yang JJ, Knockel J, Sirimulla S, Ursu O, Oprea TI. DrugCentral 2021 supports drug discovery and repositioning. Nucleic Acids Research. 2021;49(D1):D1160–D1169. doi: 10.1093/nar/gkaa997. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources