Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar 12;58(5):2068-76.
doi: 10.1021/jm5011308. Epub 2014 Dec 4.

Parallel worlds of public and commercial bioactive chemistry data

Affiliations

Parallel worlds of public and commercial bioactive chemistry data

Christopher A Lipinski et al. J Med Chem. .

Abstract

The availability of structures and linked bioactivity data in databases is powerfully enabling for drug discovery and chemical biology. However, we now review some confounding issues with the divergent expansions of public and commercial sources of chemical structures. These are associated with not only expanding patent extraction but also increasingly large vendor collections amassed via different selection criteria between SciFinder from Chemical Abstracts Service (CAS) and major public sources such as PubChem, ChemSpider, UniChem, and others. These increasingly massive collections may include both real and virtual compounds, as well as so-called prophetic compounds from patents. We address a range of issues raised by the challenges faced resolving the NIH probe compounds. In addition we highlight the confounding of prior-art searching by virtual compounds that could impact the composition of matter patentability of a new medicinal chemistry lead. Finally, we propose some potential solutions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The “usual suspects” lineup, representing molecules of different classes from public and commercial databases, illustrating the difficulty of selecting desirable ones. From left to right, the documented probe is ML010 (CID 17757274), the drug is valsartan (CID 60846), a prophetic compound is from CAS 1164083-19-5 from WO 2001056358 (not in PubChem or ChemSpider), a text extracted compound is from US20120040982 (CID 57498937), and one of the probes with incomplete data linkage is ML160 (CID 824820).
Figure 2
Figure 2
Chemical structures for 322 NIH MLP probes (http://molsync.com/demo/probes.php) have been clustered into 44 groups for visualization purposes, using ECFP_6 fingerprints and using a Tanimoto similarity threshold of >0.11 for cluster membership. The threshold was chosen empirically in order to show a representative selection of the kinds of molecules found within the set of probes. For each cluster, a representative molecule is shown (selected by picking the structure within the cluster with the highest average similarity to other structures in the same cluster). The clusters are decorated with semicircles which are colored blue for compounds that were considered high confidence based on our medicinal chemistry due diligence analysis. This analysis suggests that there is not an obvious correlation between structural composition and whether they pass the medicinal chemist’s logic. Red is for those that are not. Circle area is proportional to cluster size, and singletons are represented as a dot.

References

    1. Wang Y.; Xiao J.; Suzek T. O.; Zhang J.; Wang J.; Bryant S. H. PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res. 2009, 37, W623–W633. - PMC - PubMed
    1. Bradley D. Public molecules: small, but perfectly formed. Nat. Rev. Drug Discovery 2004, 3, 988–989. - PubMed
    1. Villoutreix B. O.; Lagorce D.; Labbe C. M.; Sperandio O.; Miteva M. A. One hundred thousand mouse clicks down the road: selected online resources supporting drug discovery collected over a decade. Drug Discovery Today 2013, 18, 1081–1089. - PubMed
    1. Li Q.; Cheng T.; Wang Y.; Bryant S. H. PubChem as a public resource for drug discovery. Drug Discovery Today 2010, 15, 1052–1057. - PMC - PubMed
    1. European Lead Factory. http://www.europeanleadfactory.eu/# (accessed November 19, 2014).

MeSH terms

Substances