Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 23:4:170073.
doi: 10.1038/sdata.2017.73.

Unique identifiers for small molecules enable rigorous labeling of their atoms

Affiliations

Unique identifiers for small molecules enable rigorous labeling of their atoms

Hesam Dashti et al. Sci Data. .

Abstract

Rigorous characterization of small organic molecules in terms of their structural and biological properties is vital to biomedical research. The three-dimensional structure of a molecule, its 'photo ID', is inefficient for searching and matching tasks. Instead, identifiers play a key role in accessing compound data. Unique and reproducible molecule and atom identifiers are required to ensure the correct cross-referencing of properties associated with compounds archived in databases. The best approach to this requirement is the International Chemical Identifier (InChI). However, the current implementation of InChI fails to provide a complete standard for atom nomenclature, and incorrect use of the InChI standard has resulted in the proliferation of non-unique identifiers. We propose a methodology and associated software tools, named ALATIS, that overcomes these shortcomings. ALATIS is an adaptation of InChI, which operates fully within the InChI convention to provide unique and reproducible molecule and all atom identifiers. ALATIS includes an InChI extension for unique atom labeling of symmetric molecules. ALATIS forms the basis for improving reproducibility and unifying cross-referencing across databases.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Non-unique labeling of atoms in databases.
(a) 2D representation of the structure file of L-leucine (Data Citation 5) downloaded from BMRB. (b) 2D representation of the structure file of L-leucine (Data Citation 6) downloaded from HMDB,,. As shown in the diagram the two structural representations utilize different atom numberings. As a result, the relationship between atoms and their labels (numbers) is database-specific.
Figure 2
Figure 2. Process for creating a unique and reproducible molecular identifier and complete atom labels.
Overview of the steps considered in ALATIS.
Figure 3
Figure 3. Flowchart for the software package ALATIS.
The webserver for ALATIS accepts a structure file for the compound as input (SDF or MDL Mol-V2000 file) along with optional atom labels. Three modules receive the data (dashed arrows). The InChI-1 program executes in the background to generate the standard InChI string for the input. The modules work in concert in order to assign unique labels to heavy atoms as well as to hydrogen atoms of the molecule (solid arrows). To label heavy atoms, two sub-modules are used to construct two graph representations for the molecule (using the input structure file and the generated standard InChI strings; see Supplementary Information 2 for the details of the graph representation). Another sub-module maps the graphs to a representation suitable for assigning unique labels to the heavy atoms. The module responsible for assigning unique labels to the hydrogen atoms imposes temporary chiral centers on the heavy atoms in order to distinguish between the hydrogens attached to each heavy atom. The idea of introducing temporary chiral centers is elaborated further to accommodate atom labeling of symmetric molecules. During this process the InChI-1 program is executed repeatedly and iteratively (solid arrows). In the cases where the input structure file contains multiple molecular structures (for example representing different tautomeric states), a separate module carries out the processing. ALATIS reports unique labels for molecules in the mixture and their constituent atoms. ALATIS outputs a standard InChI string for the compound, a structure file that contains the unique labels of the atoms, and a map between the atoms labels of the input and the generated unique atom labels.

References

Data Citations

    1. 2005. NCBI PubChem Compound. 11444
    1. 2004. NCBI PubChem Compound. 112072
    1. 2012. The Human Metabolome Database. HMDB13785
    1. Jofre F., Anderson M. E., Markley J. L., Rapolu R. 2017. Biological Magnetic Resonance Data Bank. bmse000660
    1. Jofre F., Anderson M. E., Markley J. L., Rapolu R. 2017. Biological Magnetic Resonance Data Bank. bmse000042

References

    1. Leung I. K. H. et al. A reporter ligand NMR screening method for 2-oxoglutarate oxygenase inhibitors. J Med Chem 56, 547–555 (2013). - PMC - PubMed
    1. Khan A. et al. Development and application of ligand-based NMR screening assays for γ-butyrobetaine hydroxylase. MedChemComm 7, 873–880 (2016).
    1. Houston D. R., Yen L.-H., Pettit S. & Walkinshaw M. D. Structure- and ligand-based virtual screening identifies new scaffolds for inhibitors of the oncoprotein MDM2. PLoS ONE 10, e0121424 (2015). - PMC - PubMed
    1. Olson S. F. et al. Computational protein-ligand docking and virtual drug screening with the AutoDock suite. Nat Prot 11, 905–919 (2016). - PMC - PubMed
    1. Fan H., Irwin J. J. & Sali A. Virtual ligand screening against comparative protein structure models. Methods Mol Biol 819, 105–126 (2011). - PMC - PubMed

Publication types

LinkOut - more resources