Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jan 14;5(1):3.
doi: 10.1186/1758-2946-5-3.

UniChem: a unified chemical structure cross-referencing and identifier tracking system

Affiliations

UniChem: a unified chemical structure cross-referencing and identifier tracking system

Jon Chambers et al. J Cheminform. .

Abstract

UniChem is a freely available compound identifier mapping service on the internet, designed to optimize the efficiency with which structure-based hyperlinks may be built and maintained between chemistry-based resources. In the past, the creation and maintenance of such links at EMBL-EBI, where several chemistry-based resources exist, has required independent efforts by each of the separate teams. These efforts were complicated by the different data models, release schedules, and differing business rules for compound normalization and identifier nomenclature that exist across the organization. UniChem, a large-scale, non-redundant database of Standard InChIs with pointers between these structures and chemical identifiers from all the separate chemistry resources, was developed as a means of efficiently sharing the maintenance overhead of creating these links. Thus, for each source represented in UniChem, all links to and from all other sources are automatically calculated and immediately available for all to use. Updated mappings are immediately available upon loading of new data releases from the sources. Web services in UniChem provide users with a single simple automatable mechanism for maintaining all links from their resource to all other sources represented in UniChem. In addition, functionality to track changes in identifier usage allows users to monitor which identifiers are current, and which are obsolete. Lastly, UniChem has been deliberately designed to allow additional resources to be included with minimal effort. Indeed, the recent inclusion of data sources external to EMBL-EBI has provided a simple means of providing users with an even wider selection of resources with which to link to, all at no extra cost, while at the same time providing a simple mechanism for external resources to link to all EMBL-EBI chemistry resources.

PubMed Disclaimer

Figures

Figure 1
Figure 1
UniChem efficiently manages the creation and maintenance of structure-based ‘links’ between small molecule containing resources. Historically, the maintenance of ‘links’ between EMBL-EBI small molecule resources has adopted a model (A) where each resource must individually manage its own links to all other resources. The UniChem solution uses a model (B) where the mappings are maintained centrally, resulting in significantly lower overall maintenance costs, and allowing for the simple inclusion of additional resources in the future.
Figure 2
Figure 2
The UniChem schema. The UniChem schema consists of four main tables. Structures are stored in the UC_STRUCTURES table, sources in the UC_SOURCES table. The UC_XREF table contains a list of all src_compound_ids to UCI assignments, and fields to indicate whether these assignments are current or obsolete. The UC_RELEASE table tracks information on data releases for all sources. For clarity, not all fields are shown. Primary/foreign key constraints are indicated by solid arrows. PK = Primary Key, FK = Foreign Key.
Figure 3
Figure 3
Example query using the UniChem web interface. On the UniChem web interface, querying with a single src_compound_id will retrieve a list of all assignments (current and obsolete) which share the same Standard InChI to which the query src_compound_id is currently assigned to. This is illustrated by example in the table below, which shows the data retrieved when querying with the ChEMBL identifier for diazepam: ‘CHEMBL12’. The data columns shown are explained in the text.

Similar articles

Cited by

References

    1. ChEBI. http://www.ebi.ac.uk/chebi.
    1. de Matos P, Alcántara R, Dekker A, Ennis M, Hastings J, Haug K, Spiteri I, Turner S, Steinbeck C. Chemical Entities of Biological Interest: an update. Nucleic Acids Res. 2010;38:D249–254. doi: 10.1093/nar/gkp886. - DOI - PMC - PubMed
    1. ChEMBL. https://www.ebi.ac.uk/chembl.
    1. Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40:D1100–D1107. doi: 10.1093/nar/gkr777. - DOI - PMC - PubMed
    1. PDBe. http://www.ebi.ac.uk/pdbe.