Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Sep 4;6(1):43.
doi: 10.1186/s13321-014-0043-5. eCollection 2014 Dec.

UniChem: extension of InChI-based compound mapping to salt, connectivity and stereochemistry layers

Affiliations

UniChem: extension of InChI-based compound mapping to salt, connectivity and stereochemistry layers

Jon Chambers et al. J Cheminform. .

Abstract

UniChem is a low-maintenance, fast and freely available compound identifier mapping service, recently made available on the Internet. Until now, the criterion of molecular equivalence within UniChem has been on the basis of complete identity between Standard InChIs. However, a limitation of this approach is that stereoisomers, isotopes and salts of otherwise identical molecules are not considered as related. Here, we describe how we have exploited the layered structural representation of the Standard InChI to create new functionality within UniChem that integrates these related molecular forms. The service, called 'Connectivity Search' allows molecules to be first matched on the basis of complete identity between the connectivity layer of their corresponding Standard InChIs, and the remaining layers then compared to highlight stereochemical and isotopic differences. Parsing of Standard InChI sub-layers permits mixtures and salts to also be included in this integration process. Implementation of these enhancements required simple modifications to the schema, loader and web application, but none of which have changed the original UniChem functionality or services. The scope of queries may be varied using a variety of easily configurable options, and the output is annotated to assist the user to filter, sort and understand the difference between query and retrieved structures. A RESTful web service output may be easily processed programmatically to allow developers to present the data in whatever form they believe their users will require, or to define their own level of molecular equivalence for their resource, albeit within the constraint of identical connectivity.

Keywords: Chemical databases; Connectivity search; Data integration; InChIKey; Standard InChI; UniChem.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Modifications to the UniChem schema required to implement connectivity search. The UniChem schema (described previously [1]) was modified by the addition of the UC_FIKHB_HIERARCHY table and the FIKHB field within the UC_STRUCTURES table. Both additions are highlighted with bold and shading. Full details of the function of these additions are given in the text. For clarity, not all fields are shown. Primary/foreign key constraints are indicated by solid arrows. PK = Primary Key, FK = Foreign Key.
Figure 2
Figure 2
Connectivity search web interface results page. The results of a Connectivity Search in the UniChem web interface are shown in a sortable table, with a single matching src_compound_id-to-structure assignment per record. Here are shown the results of a query using src_compound_id CHEMBL15245 (Yohimbine) from the ChEMBL resource. In total, 16 records were retrieved by this query, but for clarity only the first 7 are shown. Comparisons of the individual layers of the Standard InChI are shown (p, b, t, m, s and i), with differences shown with a `1’ (and highlighted), and identical layers shown with a `0’.
Figure 3
Figure 3
KNIME workflow for compound novelty checking. The workflow tool KNIME can be used with Connectivity Search to check for the novelty of a particular compound. (A) A summary of the entire workflow, as detailed in the text. (B) A KNIME node dialogue allows the users to specify criteria A-H for the Connectivity Search. (C) The search hits are returned and converted from InChI strings to molecular images for easier inspection.
Figure 4
Figure 4
Using Connectivity Search to alert users of one source to alternative molecular forms of a compound in other resources. The ChEMBL resource utilizes Connectivity Search to alert users to alternative molecular forms of ChEMBL compounds in other sources. The page shown here, reached from a link from the ChEMBL page for CHEMBL15245, gives full details of all alternative stereoisomers, isotopic variants and salt and mixture forms of CHEMBL15245. In this case, the matching data are clustered by source, although clearly other formats are easily created depending upon the requirements of the users of the resource.

References

    1. Chambers J, Davies M, Gaulton A, Hersey A, Velankar S, Petryszak R, Hastings J, Bellis L, McGlinchey S, Overington JP. UniChem: a unified chemical structure cross-referencing and identifier tracking system. J Cheminformatics. 2013;5:3. doi: 10.1186/1758-2946-5-3. - DOI - PMC - PubMed
    1. Heller S, McNaught A, Stein S, Tchekhovskoi D, Pletnev I. InChI - the worldwide chemical structure identifier standard. J Cheminformatics. 2013;5:7. doi: 10.1186/1758-2946-5-7. - DOI - PMC - PubMed
    1. NIH Chemical Identifier Resolver., [http://cactus.nci.nih.gov/chemical/structure]
    1. ChemSpider., [http://www.chemspider.com/]
    1. Williams A, Tkachenko V: The Royal Society of Chemistry and the delivery of chemistry data repositories for the community.J Comput Aided Mol Des 2014., - PubMed

LinkOut - more resources