Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Nov 15;26(22):2881-8.
doi: 10.1093/bioinformatics/btq550. Epub 2010 Oct 13.

Investigating the correlations among the chemical structures, bioactivity profiles and molecular targets of small molecules

Affiliations

Investigating the correlations among the chemical structures, bioactivity profiles and molecular targets of small molecules

Tiejun Cheng et al. Bioinformatics. .

Abstract

Motivation: Most of the previous data mining studies based on the NCI-60 dataset, due to its intrinsic cell-based nature, can hardly provide insights into the molecular targets for screened compounds. On the other hand, the abundant information of the compound-target associations in PubChem can offer extensive experimental evidence of molecular targets for tested compounds. Therefore, by taking advantages of the data from both public repositories, one may investigate the correlations between the bioactivity profiles of small molecules from the NCI-60 dataset (cellular level) and their patterns of interactions with relevant protein targets from PubChem (molecular level) simultaneously.

Results: We investigated a set of 37 small molecules by providing links among their bioactivity profiles, protein targets and chemical structures. Hierarchical clustering of compounds was carried out based on their bioactivity profiles. We found that compounds were clustered into groups with similar mode of actions, which strongly correlated with chemical structures. Furthermore, we observed that compounds similar in bioactivity profiles also shared similar patterns of interactions with relevant protein targets, especially when chemical structures were related. The current work presents a new strategy for combining and data mining the NCI-60 dataset and PubChem. This analysis shows that bioactivity profile comparison can provide insights into the mode of actions at the molecular level, thus will facilitate the knowledge-based discovery of novel compounds with desired pharmacological properties.

Availability: The bioactivity profiling data and the target annotation information are publicly available in the PubChem BioAssay database (ftp://ftp.ncbi.nlm.nih.gov/pubchem/Bioassay/).

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Hierarchical clustering of the 37 compounds in the final set based on their bioactivity profiles in the NCI-60 cell lines. The bioactivity profile of each compound is shown in spectrum (horizontal view). A minimum similarity threshold of 0.88 (red solid line) is employed in HCE. Six clusters that contain more than one compound are marked as A through F from top to bottom. Relevant compounds (24 in total) are labeled with PubChem compound identifiers (CID).
Fig. 2.
Fig. 2.
The five camptothecin analogs identified from cluster B. (A) 2D chemical structures, (B) bioactivity profiles in the NCI-60 cell lines on nine different organs and (C) compound–target interaction network (see Fig. 4 for general description).
Fig. 3.
Fig. 3.
The three compounds identified from cluster F. (A) 2D chemical structures, (B) bioactivity profiles in the NCI-60 cell lines on nine different organs and (C) compound–target interaction network (see Fig. 4 for general description).
Fig. 4.
Fig. 4.
The complete diagram of the compound–target interaction network for the 24 compounds identified from the six clusters (i.e. A to F) obtained by hierarchical clustering. Compounds are denoted as ellipses, which are labeled with PubChem compound identifier (CID) and colored according to the clusters they belong to. Targets are denoted as rectangles, which are labeled with NCBI protein identifier (GI) and colored with dark or light red if the corresponding assay is a confirmatory or primary bioassay in PubChem, respectively. The edge linking an ellipse and a rectangle indicates that there is an interaction if the current compound is found active against the target of interest. No edge is allowed between either two ellipses or two rectangles. For simplicity, target nodes that have only single connecting compound node are not shown.

Similar articles

Cited by

References

    1. Berman HM, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. - PMC - PubMed
    1. Chen B, Wild DJ. PubChem BioAssays as a data source for predictive models. J. Mol. Graphics Model. 2010;28:420–426. - PubMed
    1. Chen B, et al. PubChem as a source of polypharmacology. J. Chem. Inf. Model. 2009;49:2044–2055. - PubMed
    1. DiMasi JA, et al. The price of innovation: new estimates of drug development costs. J. Health Econ. 2003;22:151–185. - PubMed
    1. Guha R. Flexible web service infrastructure for the development and deployment of predictive models. J. Chem. Inf. Model. 2008;48:456–464. - PubMed

Publication types

Substances