PubChem3D: Biologically relevant 3-D similarity
- PMID: 21781288
- PMCID: PMC3223603
- DOI: 10.1186/1758-2946-3-26
PubChem3D: Biologically relevant 3-D similarity
Abstract
Background: The use of 3-D similarity techniques in the analysis of biological data and virtual screening is pervasive, but what is a biologically meaningful 3-D similarity value? Can one find statistically significant separation between "active/active" and "active/inactive" spaces? These questions are explored using 734,486 biologically tested chemical structures, 1,389 biological assay data sets, and six different 3-D similarity types utilized by PubChem analysis tools.
Results: The similarity value distributions of 269.7 billion unique conformer pairs from 734,486 biologically tested compounds (all-against-all) from PubChem were utilized to help work towards an answer to the question: what is a biologically meaningful 3-D similarity score? The average and standard deviation for the six similarity measures STST-opt, CTST-opt, ComboTST-opt, STCT-opt, CTCT-opt, and ComboTCT-opt were 0.54 ± 0.10, 0.07 ± 0.05, 0.62 ± 0.13, 0.41 ± 0.11, 0.18 ± 0.06, and 0.59 ± 0.14, respectively. Considering that this random distribution of biologically tested compounds was constructed using a single theoretical conformer per compound (the "default" conformer provided by PubChem), further study may be necessary using multiple diverse conformers per compound; however, given the breadth of the compound set, the single conformer per compound results may still apply to the case of multi-conformer per compound 3-D similarity value distributions. As such, this work is a critical step, covering a very wide corpus of chemical structures and biological assays, creating a statistical framework to build upon.The second part of this study explored the question of whether it was possible to realize a statistically meaningful 3-D similarity value separation between reputed biological assay "inactives" and "actives". Using the terminology of noninactive-noninactive (NN) pairs and the noninactive-inactive (NI) pairs to represent comparison of the "active/active" and "active/inactive" spaces, respectively, each of the 1,389 biological assays was examined by their 3-D similarity score differences between the NN and NI pairs and analyzed across all assays and by assay category types. While a consistent trend of separation was observed, this result was not statistically unambiguous after considering the respective standard deviations. While not all "actives" in a biological assay are amenable to this type of analysis, e.g., due to different mechanisms of action or binding configurations, the ambiguous separation may also be due to employing a single conformer per compound in this study. With that said, there were a subset of biological assays where a clear separation between the NN and NI pairs found. In addition, use of combo Tanimoto (ComboT) alone, independent of superposition optimization type, appears to be the most efficient 3-D score type in identifying these cases.
Conclusion: This study provides a statistical guideline for analyzing biological assay data in terms of 3-D similarity and PubChem structure-activity analysis tools. When using a single conformer per compound, a relatively small number of assays appear to be able to separate "active/active" space from "active/inactive" space.
Figures



















Similar articles
-
Effects of multiple conformers per compound upon 3-D similarity search and bioassay data analysis.J Cheminform. 2012 Nov 7;4(1):28. doi: 10.1186/1758-2946-4-28. J Cheminform. 2012. PMID: 23134593 Free PMC article.
-
PubChem3D: conformer ensemble accuracy.J Cheminform. 2013 Jan 7;5(1):1. doi: 10.1186/1758-2946-5-1. J Cheminform. 2013. PMID: 23289532 Free PMC article.
-
PubChem3D: Similar conformers.J Cheminform. 2011 May 9;3:13. doi: 10.1186/1758-2946-3-13. J Cheminform. 2011. PMID: 21554721 Free PMC article.
-
Cephalostatin analogues--synthesis and biological activity.Fortschr Chem Org Naturst. 2004;87:1-80. doi: 10.1007/978-3-7091-0581-8_1. Fortschr Chem Org Naturst. 2004. PMID: 15079895 Review.
-
[Non-native conformational states of immunoglobulins: thermodynamic and functional analysis of rabbit IgG].Biokhimiia. 1996 Feb;61(2):212-35. Biokhimiia. 1996. PMID: 8717493 Review. Russian.
Cited by
-
Effects of multiple conformers per compound upon 3-D similarity search and bioassay data analysis.J Cheminform. 2012 Nov 7;4(1):28. doi: 10.1186/1758-2946-4-28. J Cheminform. 2012. PMID: 23134593 Free PMC article.
-
Exploring Chemical Information in PubChem.Curr Protoc. 2021 Aug;1(8):e217. doi: 10.1002/cpz1.217. Curr Protoc. 2021. PMID: 34370395 Free PMC article.
-
Leveraging 3D chemical similarity, target and phenotypic data in the identification of drug-protein and drug-adverse effect associations.J Cheminform. 2016 Jul 1;8:35. doi: 10.1186/s13321-016-0147-1. eCollection 2016. J Cheminform. 2016. PMID: 27375776 Free PMC article.
-
Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets.J Cheminform. 2016 Nov 4;8:62. doi: 10.1186/s13321-016-0163-1. eCollection 2016. J Cheminform. 2016. PMID: 27872662 Free PMC article.
-
Enhancing adverse drug event detection in electronic health records using molecular structure similarity: application to pancreatitis.PLoS One. 2012;7(7):e41471. doi: 10.1371/journal.pone.0041471. Epub 2012 Jul 24. PLoS One. 2012. PMID: 22911794 Free PMC article.
References
-
- Diller DJ. The synergy between combinatorial chemistry and high-throughput screening. Curr Opin Drug Discov Dev. 2008;11:346–355. - PubMed
LinkOut - more resources
Full Text Sources
Research Materials