Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 6;11(17):4351-4359.
doi: 10.1039/d0sc00442a.

DP4-AI automated NMR data analysis: straight from spectrometer to structure

Affiliations

DP4-AI automated NMR data analysis: straight from spectrometer to structure

Alexander Howarth et al. Chem Sci. .

Abstract

A robust system for automatic processing and assignment of raw 13C and 1H NMR data DP4-AI has been developed and integrated into our computational organic molecule structure elucidation workflow. Starting from a molecular structure with undefined stereochemistry or other structural uncertainty, this system allows for completely automated structure elucidation. Methods for NMR peak picking using objective model selection and algorithms for matching the calculated 13C and 1H NMR shifts to peaks in noisy experimental NMR data were developed. DP4-AI achieved a 60-fold increase in processing speed, and near-elimination of the need for scientist time, when rigorously evaluated using a challenging test set of molecules. DP4-AI represents a leap forward in NMR structure elucidation and a step-change in the functionality of DP4. It enables high-throughput analyses of databases and large sets of molecules, which were previously impossible, and paves the way for the discovery of new structural information through machine learning. This new functionality has been coupled with an intuitive GUI and is available as open-source software at https://github.com/KristapsE/DP4-AI.

PubMed Disclaimer

Conflict of interest statement

There are no conflicts to declare.

Figures

Fig. 1
Fig. 1. (a) The structure of DP4-AI. This system affords fully automated stereochemistry elucidation, only the raw NMR data is a required input from the user. (b) Example structures with stereochemistry correctly predicted fully automatically using DP4-AI integrated in PyDP4.
Fig. 2
Fig. 2. The overall structure of DP4-AI. Raw NMR data is processed in a series of stages to yield experimental multiplet shift values and their integrals. The program then takes shifts calculated using DFT for each atom in the molecule and assigns them to the experimental peaks. This assignment is then used to calculate a DP4 probability for each diastereomer.
Fig. 3
Fig. 3. Figure illustrating the gradient peak picking process. Peaks are picked if they are below a threshold in the second derivative (orange) and above an intensity threshold (blue). The final picked peaks are highlighted in green.
Fig. 4
Fig. 4. An example multiplet (blue) and deconvolved model (orange). The signal peaks are highlighted in cyan, the peaks determined to be noise are highlighted in red.
Fig. 5
Fig. 5. Figure illustrating how calculated shifts can be assigned to experimental peaks using the assignment probability matrix M. (a) The peaks in the simulated calculated spectrum (blue) are assigned to those in the experimental spectrum (orange). (b) The matrix M is calculated and the optimum assignment (cyan) calculated. (c) The final assignment found in this example.
Fig. 6
Fig. 6. Peaks (left) are grouped by amplitude, depending on the minima in the second derivative of the amplitude probability density function (right) they fall between (dashed lines). In this simulated example, the number of carbon atoms in the structure is nine. The cumulative sum of peaks above each groups lower boundary is calculated, the weight assigned to each group is the number of carbon atoms in the structure divided by this value. The weights are then normalized to fix the largest weight to one.
Fig. 7
Fig. 7. Figure illustrating the 47 molecules utilized to evaluate the performance of DP4-AI. Molecules, AT3, TS3A, TS4 and NL1A have only have corresponding 1H NMR data, all other molecules have both 1H and 13C NMR data. The spectra for molecules JB7, JB11, JB5 and JB8 were taken in solvents methanol, benzene, DMSO and methanol respectively, whilst all others were taken in CDCl3. Sources for the spectral data: AT1-3, BYH1-2, JB1-13B, TP1-4 (personal correspondence), TS1-4 (personal correspondence), OD1 (personal correspondence).
Fig. 8
Fig. 8. The correct prediction rates for DP4-AI (orange) and the pairwise AA (blue) at the three levels of theory tested for the compounds in Fig. 7 (average number of stereocentres equal to 3.49). These predictions were produced using the fitted 3 Gaussian cross validated statistical model.
Fig. 9
Fig. 9. DP4-AI processed and assigned 1H spectrum of molecule BYH1 (taken in chloroform).
Fig. 10
Fig. 10. NMR-AI can process a molecule for DP4 calculation in around one minute, a task that previously would require roughly 8 hours of the users time. This corresponds to a ∼60 fold increase in the number of molecules that can be processed per day.

Similar articles

Cited by

References

    1. Barone G. Gomez-Paloma L. Duca D. Silvestri A. Riccio R. Bifulco G. Chem.–Eur. J. 2002;8:3233. doi: 10.1002/1521-3765(20020715)8:14<3233::AID-CHEM3233>3.0.CO;2-0. - DOI - PubMed
    1. Barone G. Duca D. Silvestri A. Gomez-Paloma L. Riccio R. Bifulco G. Chem.–Eur. J. 2002;8:3240. doi: 10.1002/1521-3765(20020715)8:14<3240::AID-CHEM3240>3.0.CO;2-G. - DOI - PubMed
    1. Smith S. G. Goodman J. M. J. Am. Chem. Soc. 2010;132:12946–12959. doi: 10.1021/ja105035r. - DOI - PubMed
    1. Smith S. G. Goodman J. M. J. Org. Chem. 2009;74:4597–4607. doi: 10.1021/jo900408d. - DOI - PubMed
    1. Snyder K. M. Sikorska J. Ye T. Fang L. Su W. Carter R. G. McPhail K. L. Cheong P. H.-Y. Org. Biomol. Chem. 2016;14:5826. doi: 10.1039/C6OB00707D. - DOI - PubMed