Profile scaling increases the similarity search performance of molecular fingerprints containing numerical descriptors and structural keys
- PMID: 12870914
- DOI: 10.1021/ci030287u
Profile scaling increases the similarity search performance of molecular fingerprints containing numerical descriptors and structural keys
Abstract
The concept of compound class-specific profiling and scaling of molecular fingerprints for similarity searching is discussed and applied to newly designed fingerprint representations. The approach is based on the analysis of characteristic patterns of bits in keyed fingerprints that are set on in compounds having equivalent biological activity. Once a fingerprint profile is generated for a particular activity class, scaling factors that are weighted according to observed bit frequencies are applied to signature bit positions when searching for similar compounds. In systematic similarity search calculations over 23 diverse activity classes, profile scaling consistently increased the performance of fingerprints containing property descriptors and/or structural keys. A significant improvement of approximately 15% was observed for a new fingerprint consisting of binary encoded molecular property descriptors and structural keys. Under scaling conditions, this fingerprint, termed MP-MFP, correctly recognized on average close to 60% of all active test compounds, with only a few false positives. MP-MFP outperformed MACCS keys and other reference fingerprints. In general, optimum performance in scaling calculations was achieved at higher threshold values of the Tanimoto coefficient than in nonscaled calculations, thereby increasing the search selectivity. In general, putting relatively high weight on signature bit positions that were always, or almost always, set on was found to be the most effective scaling procedure. Analysis of class-specific search performance revealed that profile scaling of MP-MFP improved the similarity search results for each of the 23 activity classes.
Similar articles
-
Similarity search profiling reveals effects of fingerprint scaling in virtual screening.J Chem Inf Comput Sci. 2004 Nov-Dec;44(6):2032-9. doi: 10.1021/ci0400819. J Chem Inf Comput Sci. 2004. PMID: 15554672
-
Design and evaluation of a molecular fingerprint involving the transformation of property descriptor values into a binary classification scheme.J Chem Inf Comput Sci. 2003 Jul-Aug;43(4):1151-7. doi: 10.1021/ci030285+. J Chem Inf Comput Sci. 2003. PMID: 12870906
-
Bit silencing in fingerprints enables the derivation of compound class-directed similarity metrics.J Chem Inf Model. 2008 Sep;48(9):1754-9. doi: 10.1021/ci8002045. Epub 2008 Aug 13. J Chem Inf Model. 2008. PMID: 18698839
-
Mini-fingerprints for virtual screening: design principles and generation of novel prototypes based on information theory.SAR QSAR Environ Res. 2003 Feb;14(1):27-40. doi: 10.1080/1062936021000058764. SAR QSAR Environ Res. 2003. PMID: 12688414 Review.
-
Fingerprint design and engineering strategies: rationalizing and improving similarity search performance.Future Med Chem. 2012 Oct;4(15):1945-59. doi: 10.4155/fmc.12.126. Future Med Chem. 2012. PMID: 23088275 Review.
Cited by
-
Statistical-based database fingerprint: chemical space dependent representation of compound databases.J Cheminform. 2018 Nov 22;10(1):55. doi: 10.1186/s13321-018-0311-x. J Cheminform. 2018. PMID: 30467740 Free PMC article.
-
Comprehensive structural and functional characterization of the human kinome by protein structure modeling and ligand virtual screening.J Chem Inf Model. 2010 Oct 25;50(10):1839-54. doi: 10.1021/ci100235n. J Chem Inf Model. 2010. PMID: 20853887 Free PMC article.
-
Bounds and algorithms for fast exact searches of chemical fingerprints in linear and sublinear time.J Chem Inf Model. 2007 Mar-Apr;47(2):302-17. doi: 10.1021/ci600358f. Epub 2007 Feb 28. J Chem Inf Model. 2007. PMID: 17326616 Free PMC article.
-
Hashing algorithms and data structures for rapid searches of fingerprint vectors.J Chem Inf Model. 2010 Aug 23;50(8):1358-68. doi: 10.1021/ci100132g. J Chem Inf Model. 2010. PMID: 20681581 Free PMC article.
-
When is chemical similarity significant? The statistical distribution of chemical similarity scores and its extreme values.J Chem Inf Model. 2010 Jul 26;50(7):1205-22. doi: 10.1021/ci100010v. J Chem Inf Model. 2010. PMID: 20540577 Free PMC article.
LinkOut - more resources
Full Text Sources