Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Dec 27;3(12):1237-1245.
doi: 10.1021/acscentsci.7b00355. Epub 2017 Nov 16.

Computer-Assisted Retrosynthesis Based on Molecular Similarity

Affiliations

Computer-Assisted Retrosynthesis Based on Molecular Similarity

Connor W Coley et al. ACS Cent Sci. .

Abstract

We demonstrate molecular similarity to be a surprisingly effective metric for proposing and ranking one-step retrosynthetic disconnections based on analogy to precedent reactions. The developed approach mimics the retrosynthetic strategy defined implicitly by a corpus of known reactions without the need to encode any chemical knowledge. Using 40 000 reactions from the patent literature as a knowledge base, the recorded reactants are among the top 10 proposed precursors in 74.1% of 5000 test reactions, providing strong quantitative support for our methodology. Extension of the one-step strategy to multistep pathway planning is demonstrated and discussed for two exemplary drug products.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Six most-frequent precursors for the disconnection of a single bond between two aromatic carbons. Once a strategic disconnection is identified (SMARTS: [cH0]–[cH0]), there may still be dozens of locally plausible precursors to accomplish the transformation, including different combinations of halides and boronic acid/esters. (1) Bromide and acid; (2) bromide and ester; (3) chloride and acid; (4) iodide and acid; (5) chloride and ester; (6) iodide and ester.
Figure 2
Figure 2
Example prediction of retrosynthetic heteroatom alkylation/arylation reactions for 1-phenyl-3,4-dihydroquinolin-2(1H)-one. After recalling up to 20 reaction precedents in order of decreasing product similarity, the precedent reaction site (highlighted in red and displayed as a SMARTS string) is extracted and matched against the target compound. Of the precedent reactions with the most similar products, not all involve a reaction site that matches the target compound and thus not all produce candidate precursors. Aside from the first reaction, precedents with inapplicable reaction sites are not shown for brevity. The recorded reactants for this target compound (highlighted inside a green box) are recovered and predicted with rank 2; however, all of the top five precursor suggestions are chemically reasonable. Similarity scores are shown using Morgan2noFeat and Tanimoto (see the section on Similarity Calculation).
Figure 3
Figure 3
Example similarity score calculation using Morgan2Feat fingerprints and the Tanimoto metric. Colors indicate atom-level contributions to the overall similarity (green: increases similarity score, red: decreases similarity score, uncolored: has no effect).
Figure 4
Figure 4
Example retrosynthetic predictions when pooling all reaction classes. The model successfully proposes the recorded reactants with rank 1, corresponding to an aldol condensation. Other suggestions among the top nine include three ring-closing amidations to build the five-membered ring, two SNAr reactions to install cyclopropamine, and three amine deprotections.
Figure 5
Figure 5
Multistep synthesis plans. Routes are constructed by recursively applying the one-step retrosynthetic methodology to (a) lenalidomide and (b) salmeterol. The suggested disconnections are consistent with published pathways, highlighted with green and blue backgrounds for lendalidomide and salmeterol, respectively. Slight differences are described in the main text.

References

    1. Robinson R. LXIII – A synthesis of tropinone. J. Chem. Soc., Trans. 1917, 111, 762–768. 10.1039/CT9171100762. - DOI
    1. Corey E. J. The logic of chemical synthesis: multistep synthesis of complex carbogenic molecules (Nobel Lecture). Angew. Chem., Int. Ed. Engl. 1991, 30, 455–465. 10.1002/anie.199104553. - DOI
    1. Gasteiger J.; Ihlenfeldt W.. Software Development in Chemistry 4; Springer, 1990; pp 57–65.
    1. Ott M. A.; Noordik J. H. Computer tools for reaction retrieval and synthesis planning in organic chemistry. A brief review of their history, methods, and programs. Recl. Trav. Chim. Pays-Bas 1992, 111, 239–246. 10.1002/recl.19921110601. - DOI
    1. Todd M. H. Computer-aided organic synthesis. Chem. Soc. Rev. 2005, 34, 247–266. 10.1039/b104620a. - DOI - PubMed