Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Apr 5:4:8.
doi: 10.1186/1758-2946-4-8.

Structure-based classification and ontology in chemistry

Affiliations

Structure-based classification and ontology in chemistry

Janna Hastings et al. J Cheminform. .

Abstract

Background: Recent years have seen an explosion in the availability of data in the chemistry domain. With this information explosion, however, retrieving relevant results from the available information, and organising those results, become even harder problems. Computational processing is essential to filter and organise the available resources so as to better facilitate the work of scientists. Ontologies encode expert domain knowledge in a hierarchically organised machine-processable format. One such ontology for the chemical domain is ChEBI. ChEBI provides a classification of chemicals based on their structural features and a role or activity-based classification. An example of a structure-based class is 'pentacyclic compound' (compounds containing five-ring structures), while an example of a role-based class is 'analgesic', since many different chemicals can act as analgesics without sharing structural features. Structure-based classification in chemistry exploits elegant regularities and symmetries in the underlying chemical domain. As yet, there has been neither a systematic analysis of the types of structural classification in use in chemistry nor a comparison to the capabilities of available technologies.

Results: We analyze the different categories of structural classes in chemistry, presenting a list of patterns for features found in class definitions. We compare these patterns of class definition to tools which allow for automation of hierarchy construction within cheminformatics and within logic-based ontology technology, going into detail in the latter case with respect to the expressive capabilities of the Web Ontology Language and recent extensions for modelling structured objects. Finally we discuss the relationships and interactions between cheminformatics approaches and logic-based approaches.

Conclusion: Systems that perform intelligent reasoning tasks on chemistry data require a diverse set of underlying computational utilities including algorithmic, statistical and logic-based tools. For the task of automatic structure-based classification of chemical entities, essential to managing the vast swathes of chemical data being brought online, systems which are capable of hybrid reasoning combining several different approaches are crucial. We provide a thorough review of the available tools and methodologies, and identify areas of open research.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Similarity-based hierarchical structure clustering. Similarity-based hierarchical structure clustering is illustrated as it is computed in PubChem [14]. The figure was generated by searching for 'aspirin' and then executing the 'Structure Clustering' tool from the menu at the right. Numbers on the right are compound identifiers, unique numbers associated with chemical structures within the PubChem database.
Figure 2
Figure 2
Scaffold and MCS-based hierarchies. Scaffold-based and maximum common substructure-based hierarchies are constructed by searching for shared common parts between a group of molecules. Higher positions in the hierarchy correspond to smaller shared scaffolds and substructures, with the root being 'any atom'. The MCS-based hierarchy includes non-ring structures, while the scaffold-based hierarchy only includes ring structures. Both images were generated based on hierarchies constructed using the structures belonging to the 'organic heterocyclic molecule' class in ChEBI.
Figure 3
Figure 3
Logical models of the benzene structure. The chemical structure of benzene is illustrated together with the logical models of the class in the OWL language.

References

    1. Wegner JK, Sterling A, Guha R, Bender A, Faulon JL, Hastings J, O'Boyle N, Overington J, Van Vlijmen H, Willighagen E. Cheminformatics, the Computer Science of Chemical Discovery, Turning Open Source. Communications of the ACM. 2012. in press .
    1. Lambrix P. In: Artificial Intelligence Methods And Tools For Systems Biology, Volume 5 of Computational Biology. Dubitzky W, Azuaje F, Dress A, Vingron M, Myers G, Giegerich R, Fitch W, Pevzner PA, editor. Netherlands: Springer; 2004. Ontologies in Bioinformatics and Systems Biology; pp. 129–145.
    1. Courtot M, Juty N, Knüpfer C, Waltemath D, Zhukova A, Dräger A, Dumontier M, Finney A, Golebiewski M, Hastings J, Hoops S, Keating S, Kell DB, Kerrien S, Lawson J, Lister A, Lu J, Machne R, Mendes P, Pocock M, Rodriguez N, Villeger A, Wilkinson DJ, Wimalaratne S, Laibe C, Hucka M, Novère NL. Controlled vocabularies and semantics in systems biology. Molecular Systems Biology. 2011;7:543. - PMC - PubMed
    1. Harland L, Larminie C, Sansone SA, Popa S, Marshall MS, Braxenthaler M, Cantor M, Filsell W, Forster MJ, Huang E, Matern A, Musen M, Saric J, Slater T, Wilson J, Lynch N, Wise J, Dix I. Empowering industrial research with shared biomedical vocabularies. Drug Discovery Today. 2011;16(21-22):940–947. doi: 10.1016/j.drudis.2011.09.013. http://www.sciencedirect.com/science/article/pii/S1359644611 003035 - DOI - PMC - PubMed
    1. The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nature Genetics. 2000;25:25–29. doi: 10.1038/75556. - DOI - PMC - PubMed