Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Aug 10;1(1):12.
doi: 10.1186/1758-2946-1-12.

Small Molecule Subgraph Detector (SMSD) toolkit

Affiliations

Small Molecule Subgraph Detector (SMSD) toolkit

Syed Asad Rahman et al. J Cheminform. .

Abstract

Background: Finding one small molecule (query) in a large target library is a challenging task in computational chemistry. Although several heuristic approaches are available using fragment-based chemical similarity searches, they fail to identify exact atom-bond equivalence between the query and target molecules and thus cannot be applied to complex chemical similarity searches, such as searching a complete or partial metabolic pathway.In this paper we present a new Maximum Common Subgraph (MCS) tool: SMSD (Small Molecule Subgraph Detector) to overcome the issues with current heuristic approaches to small molecule similarity searches. The MCS search implemented in SMSD incorporates chemical knowledge (atom type match with bond sensitive and insensitive information) while searching molecular similarity. We also propose a novel method by which solutions obtained by each MCS run can be ranked using chemical filters such as stereochemistry, bond energy, etc.

Results: In order to benchmark and test the tool, we performed a 50,000 pair-wise comparison between KEGG ligands and PDB HET Group atoms. In both cases the SMSD was shown to be more efficient than the widely used MCS module implemented in the Chemistry Development Kit (CDK) in generating MCS solutions from our test cases.

Conclusion: Presently this tool can be applied to various areas of bioinformatics and chemo-informatics for finding exhaustive MCS matches. For example, it can be used to analyse metabolic networks by mapping the atoms between reactants and products involved in reactions. It can also be used to detect the MCS/substructure searches in small molecules reported by metabolome experiments, as well as in the screening of drug-like compounds with similar substructures.Thus, we present a robust tool that can be used for multiple applications, including the discovery of new drug molecules. This tool is freely available on http://www.ebi.ac.uk/thornton-srv/software/SMSD/

PubMed Disclaimer

Figures

Figure 1
Figure 1
The compatibility graph between Isobutane and Cyclopropane will generate a compatibility graph with 36 edges and 12 vertices. There are 18 c-edges (green dotted lines) and non c-edges (red lines). This will lead to 18 MCS solutions, each of size 3.
Figure 2
Figure 2
Flowchart describing the post-filtering step in the SMSD algorithm.
Figure 3
Figure 3
Flowchart outlining the methodology of the Maximum Common Subgraph (MCS) algorithm used by SMSD.
Figure 4
Figure 4
Head to head comparison of MCS jobs processed by SMSD and CDK-MCS. The similarity score frequencies (left axis) between each pair (colored boxes) of MCS solutions derived by the CDKMCS and the SMSD algorithm were sorted into bins ranging from 0 to 1 increasing in 0.01 increments. The cumulative percentage of the overall dataset for each bin is also shown (curves and right axis). Data shown in blue correspond to results from the SMSD algorithm while those in mauve correspond to CDKMCS. Data shown in yellow correspond to all the jobs that ran successfully by SMSD (includes 24 jobs which failed to run by CDKMCS). It is clear from the graph that the reported frequency of the SMSD similarity is almost similar to the CDKMCS similarity between the molecules. However the overall similarity between SMSD and CDKMCS is different because SMSD was able to process higher number of jobs than the latter. A good cut-off Tanimoto similarity score for reporting significant matches seems to be above 0.77 (at 99.9 percentile of the curve) for MCS based searches (indicated by the rightmost set of dashed lines).
Figure 5
Figure 5
The similarity score frequencies (left axis) between each pair (colored boxes) of MCS solutions derived by the CDK-Fingerprint and the SMSD algorithm were sorted into bins ranging from 0 to 1 increasing in 0.01 increments. The cumulative percentage of the overall dataset for each bin is also shown (curves and right axis). Data shown in blue correspond to results from the SMSD algorithm while those in lilac correspond to CDK-Fingerprint. It is clear from the graph that the reported frequency of the SMSD similarity is different from the fingerprint similarity between the molecules. A good cut-off Tanimoto similarity score for reporting significant matches seems to be above 0.77 (at 99.9 percentile of the curve) for Fingerprint based searches and the MCS based search (indicated by the rightmost set of dashed lines).

References

    1. Gasteiger J, Engel T. Chemoinformatics: A Textbook. WILEY-VCH GmbH & Co; 2003.
    1. Gardiner EJ, Gillet VJ, Willett P, Cosgrove DA. Representing clusters using a maximum common edge substructure algorithm applied to reduced graphs and molecular graphs. Journal of chemical information and modeling. 2007;47(2):354–366. doi: 10.1021/ci600444g. - DOI - PubMed
    1. Raymond JW, Blankley CJ, Willett P. Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures. J Mol Graph Model. 2003;21(5):421–433. doi: 10.1016/S1093-3263(02)00188-2. - DOI - PubMed
    1. Voigt JH, Bienfait B, Wang S, Nicklaus MC. Comparison of the NCI open database with seven large chemical structural databases. J Chem Inf Comput Sci. 2001;41(3):702–712. - PubMed
    1. Li Y, Hao P, Zheng S, Tu K, Fan H, Zhu R, Ding G, Dong C, Wang C, Li X. et al.Gene expression module-based chemical function similarity search. Nucleic acids research. 2008;36(20):e137. doi: 10.1093/nar/gkn610. - DOI - PMC - PubMed

LinkOut - more resources