Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 25;13(12):3507-3518.
doi: 10.1039/d1sc04406k. eCollection 2022 Mar 24.

The DP5 probability, quantification and visualisation of structural uncertainty in single molecules

Affiliations

The DP5 probability, quantification and visualisation of structural uncertainty in single molecules

Alexander Howarth et al. Chem Sci. .

Abstract

Whenever a new molecule is made, a chemist will justify the proposed structure by analysing the NMR spectra. The widely-used DP4 algorithm will choose the best match from a series of possibilities, but draws no conclusions from a single candidate structure. Here we present the DP5 probability, a step-change in the quantification of molecular uncertainty: given one structure and one 13C NMR spectra, DP5 gives the probability of the structure being correct. We show the DP5 probability can rapidly differentiate between structure proposals indistinguishable by NMR to an expert chemist. We also show in a number of challenging examples the DP5 probability may prevent incorrect structures being published and later reassigned. DP5 will prove extremely valuable in fields such as discovery-driven automated chemical synthesis and drug development. Alongside the DP4-AI package, DP5 can help guide synthetic chemists when resolving the most subtle structural uncertainty. The DP5 system is available at https://github.com/Goodman-lab/DP5.

PubMed Disclaimer

Conflict of interest statement

There are no conflicts to declare.

Figures

Fig. 1
Fig. 1. Schematic of the DP5 program. The required inputs from the user are a candidate structure and the raw 13C NMR data (or a list NMR signals). The DP5 probability is built on top of the DP4-AI analysis.
Fig. 2
Fig. 2. Schematic diagram of how the probability of observing a DFT-NMR prediction error for an atom in a given environment is calculated as described in the text.
Fig. 3
Fig. 3. The GUI accompanying DP5 pictorially overlays atomic DP5 probabilities onto the molecular structure. This clearly displays regions of the structure that are expected to be correct and conversely regions that may require revision. This functionality will help chemists assess and revise structure proposals. This structure revision example has been taken from a real-world case study of an incorrectly assigned molecule in the literature (see Results).
Fig. 4
Fig. 4. Schematic diagram of cross validation analysis used to evaluate the performance of DP5. (A) The experimental spectra of the 5140 (n) molecules from the NMRShiftDB training set with the same number of carbon atoms are permuted to produce pairs of structures and experimental spectra. (B) These pairs are separated into correct pairs, where the structure is paired to the correct spectrum and incorrect pairs where the molecule has been paired with a different spectrum. All incorrect pairs with max errors <10 ppm are considered in case 1. (C) In case 2 the incorrect pairs are assigned sampling weights to force the MAE distribution of the incorrect pairs to approximate that of the correct pairs, this leads to an expectation number of ∼5330 incorrect combinations. All DP5 probabilities in this study are calculated using a leave-one-out scheme (see ESI Section S3.2†).
Fig. 5
Fig. 5. Test set of real-world structure reassignment problems taken from chemical literature. In each example an incorrect structure was initially published (S#a) which was later reassigned to the corresponding correct structure (S#b).
Fig. 6
Fig. 6. (Top) DP5 probabilities calculated for the thirteen incorrectly published structures and corresponding revised structures. DP5 assigns much greater confidence to the revised structures and also displays the three cases where both the initially proposed and revised structures are equally improbable. (Bottom) DP4 probabilities calculated for the same thirteen examples. These results show how the DP5 probability can be used to test the reliability of a DP4 calculation, as only DP5 can discern if any of the structure proposals are likely to be correct.
Fig. 7
Fig. 7. Results of DP5 (top) and DP4 (bottom) calculations on a dataset of 42 challenging real world stereochemistry elucidation examples (see ESI Section 5.1 for structures). In both plots probabilities calculated for each diastereomer are stacked in the same order with matching colours, the correct diastereomer is always represented by the blue bar at the bottom of the stack. The checkmarks above each plot indicate molecules correctly assigned by each program. The DP5 probabilities have been divided by the number of diastereomers for each molecule, this ensures the total sum of these probabilities is within the 0–1 range (see ESI Section 5.1 for unnormalized results). These results show the two systems display similar stereochemistry elucidation performance, DP4 assigns 19 molecules correctly, whilst DP5 assigns 16 correctly, the probabilities of assigning as many molecules in this dataset correctly by chance are ∼0.0001 and ∼0.01 respectively. Both DP5 and DP4 probabilities are based only on 13C NMR data.

Similar articles

Cited by

References

    1. Nicolaou K. C. Snyder S. A. Angew. Chem., Int. Ed. 2005;44:1012–1044. doi: 10.1002/anie.200460864. - DOI - PubMed
    1. Nicolaou K. C. Zhang H. Ortiz A. Angew. Chem., Int. Ed. 2009;48:5642–5647. doi: 10.1002/anie.200902028. - DOI - PMC - PubMed
    1. Nicolaou K. C. Ortiz A. Zhang H. Angew. Chem., Int. Ed. Engl. 2009;48:5648–5652. doi: 10.1002/anie.200902029. - DOI - PMC - PubMed
    1. Kutateladze A. G. Holt T. J. Org. Chem. 2019;84:51. - PubMed
    1. Buevich A. V. Elyashberg M. E. J. Nat. Prod. 2016;79:3105–3116. doi: 10.1021/acs.jnatprod.6b00799. - DOI - PubMed