Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 1;15(1):41.
doi: 10.1186/s13321-023-00714-y.

LinChemIn: SynGraph-a data model and a toolkit to analyze and compare synthetic routes

Affiliations

LinChemIn: SynGraph-a data model and a toolkit to analyze and compare synthetic routes

Marta Pasquini et al. J Cheminform. .

Abstract

Background: The increasing amount of chemical reaction data makes traditional ways to navigate its corpus less effective, while the demand for novel approaches and instruments is rising. Recent data science and machine learning techniques support the development of new ways to extract value from the available reaction data. On the one side, Computer-Aided Synthesis Planning tools can predict synthetic routes in a model-driven approach; on the other side, experimental routes can be extracted from the Network of Organic Chemistry, in which reaction data are linked in a network. In this context, the need to combine, compare and analyze synthetic routes generated by different sources arises naturally.

Results: Here we present LinChemIn, a python toolkit that allows chemoinformatics operations on synthetic routes and reaction networks. Wrapping some third-party packages for handling graph arithmetic and chemoinformatics and implementing new data models and functionalities, LinChemIn allows the interconversion between data formats and data models and enables route-level analysis and operations, including route comparison and descriptors calculation. Object-Oriented Design principles inspire the software architecture, and the modules are structured to maximize code reusability and support code testing and refactoring. The code structure should facilitate external contributions, thus encouraging open and collaborative software development.

Conclusions: The current version of LinChemIn allows users to combine synthetic routes generated from various tools and analyze them, and constitutes an open and extensible framework capable of incorporating contributions from the community and fostering scientific discussion. Our roadmap envisages the development of sophisticated metrics for routes evaluation, a multi-parameter scoring system, and the implementation of an entire "ecosystem" of functionalities operating on synthetic routes. LinChemIn is freely available at https://github.com/syngenta/linchemin.

Keywords: Chemoinformatics; Computer-aided synthesis planning; Reaction.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Types of connections between reactions. A M2 is a molecular intermediate produced by reaction R1 and consumed by reaction R2. B M1 is a chemical produced by both R1 and R2, and represents a convergence point. C M4 is a chemical acting as reactant for both R1 and R2, and represents a divergence point
Fig. 2
Fig. 2
Schematic representation of different SynGraph types. Synthetic Path: a linear sequence connecting a root and a leaf. Synthetic Route: the union of Synthetic Paths starting from a single root that is collectively necessary and sufficient to represent a chemical synthesis. Synthetic Tree: the union of Synthetic Routes sharing the same root. Synthetic Forest: the union of Synthetic Trees with distinct roots
Fig. 3
Fig. 3
Schematic representation of different graph data models. A Monopartite, Chemical Equations only. B Monopartite, Molecules only. C Bipartite Molecules and Chemical Equations
Fig. 4
Fig. 4
Example of SynGraph instance. SynGraph instance and the graphical representation of the corresponding graph.
Fig. 5
Fig. 5
Translation and Conversion of routes. Schematic representation of the architecture relation between the LinChemIn’s modules responsible respectively for translation between format and conversion between data models
Fig. 6
Fig. 6
Route descriptors. The value of structural descriptors calculated on representative graphs illustrates their discriminative capability
Fig. 7
Fig. 7
Sub- and Super-set. Depiction of the subset concept: the SR2 on the right is a subset of SR1 on the left
Fig. 8
Fig. 8
GED vs APTED correlation. Relation between the GED values computed with the NetworkX algorithm and with the APTED algorithm. The Spearman correlation between the two sets is 0.87
Fig. 9
Fig. 9
GED vs APTED computational efficiency. Computational time needed to compute the distance matrix for an increasing number of routes with APTED and NetworkX algorithms

Similar articles

Cited by

References

    1. Kearnes SM, Maser MR, Wleklinski M, Kast A, Doyle AG, Dreher SD, Hawkins JM, Jensen KF, Coley CW. The open reaction database. J Am Chem Soc. 2021;143(45):18820–18826. doi: 10.1021/jacs.1c09820. - DOI - PubMed
    1. Jablonka KM, Patiny L, Smit B. Making the collective knowledge of chemistry open and machine actionable. Nat Chem. 2022;14(4):365–376. doi: 10.1038/s41557-022-00910-7. - DOI - PubMed
    1. Schwaller P, Vaucher AC, Laplaza R, Bunne C, Krause A, Corminboeuf C, Laino T. Machine intelligence for chemical reaction space. WIREs Comput Mol Sci. 2022 doi: 10.1002/wcms.1604. - DOI
    1. Jiang Y, Yu Y, Kong M, Mei Y, Yuan L, Huang Z, Kuang K, Wang Z, Yao H, Zou J, Coley CW, Wei Y. Artificial intelligence for retrosynthesis prediction. Engineering. 2022 doi: 10.1016/j.eng.2022.04.021. - DOI - PubMed
    1. Fitzner M, Wuitschik G, Koller RJ, Adam J-M, Schindler T, Reymond J-L. What can reaction databases teach us about buchwald-hartwig cross-couplings? Chem Sci. 2020;11(48):13085–13093. doi: 10.1039/d0sc04074f. - DOI - PMC - PubMed

LinkOut - more resources