Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Oct 9;16(45):21590-21599.
doi: 10.1039/d5sc06866e. eCollection 2025 Nov 19.

Towards automatically verifying chemical structures: the powerful combination of 1H NMR and IR spectroscopy

Affiliations

Towards automatically verifying chemical structures: the powerful combination of 1H NMR and IR spectroscopy

J Benji Rowlands et al. Chem Sci. .

Abstract

Human interpretation of spectroscopic data remains key to confirming newly synthesised chemical structures. Whilst there have been advances in automated spectral interpretation, the false positive and false negative rates remain too high to replace human interpretation. One approach, Automated Structure Verification (ASV), scores observed nuclear magnetic resonance (NMR) spectra against predicted NMR spectra. We describe a method to extend this approach to infrared (IR) spectra and apply it alongside proton NMR spectra to distinguish between a challenging set of 99 similar isomer pairs. Based on relative scores, we classify each as correct, incorrect or unsolved. Our results show that IR can be used as an efficient automated method to distinguish similar isomers with an accuracy close to that of proton NMR. We further introduce a method to combine NMR and IR results and show that the combination significantly outperforms either technique alone. At a true positive rate of 90%, unsolved pairs are reduced to 0-15% using NMR and IR together compared to 27-49% using individual techniques alone. At a true positive rate of 95%, they are reduced to 15-30% from 39-70%. These results are a significant step towards efficient automated structure verification based on easily measured spectroscopy data.

PubMed Disclaimer

Conflict of interest statement

RJL, LJ, PH, WC and TL are employed by AstraZeneca and RJL, PH, WC and TL own shares in the company. The PhD studentship of JBR is part funded by AstraZeneca. JMG received funding from AstraZeneca to develop the IR.Cai algorithm.

Figures

Fig. 1
Fig. 1. ROC plot based on the 42 molecules in the test set showing the performance of DP4* and IR.Cai for classifying structures as correct or incorrect. Note that the DP4* metric is comparative, i.e. it requires a list of possible candidate structures and scores them relative to one another, unlike IR.Cai and ACD which generate single molecule scores. Hence the result for DP4* is not directly comparable to the results for IR.Cai and ACD.
Fig. 2
Fig. 2. (a) Scheme showing how DP4* and IR.Cai scores are used to classify a pair of isomers, where one of them is known to be correct. Information from the DP4* and IR.Cai scores is aggregated using the average percentile-rank method described in the text. (b) Illustration of the structure comparison characteristic (SCC) plot. The green line is more useful for structure classification than the blue line, as the green method can correctly classify a higher proportion of molecules. The ideal result would be a point in the top left corner of the plot, denoted by the blue circle. This ideal SCC curve would have a CA (classification area) of 1.
Fig. 3
Fig. 3. Structure classification characteristic (SCC) curve based on the 42 molecules in the test set using IR.Cai scores measuring the degree of overlap between calculated and experimental spectra. IR.Cai_high and IR.Cai_low here refer to IR.Cai scores calculated with IR spectra computed at the B3PW91/cc-pVTZ (high level) and B3LYP/6-31G* (low level) levels of theory respectively. The position of the SCC curve and higher CA indicates better performance for the higher theory level. We therefore use the higher level of theory for the results in the rest of this work.
Fig. 4
Fig. 4. Structure classification characteristic (SCC) curve based on the 42 molecules in the test set using raw DP4* and ACD scores. Some of the ACD scores were identical for the correct structure and an incorrect isomer, so it was not possible to classify all pairs. The dotted section of the line for ACD therefore represents the expectation of random guessing to choose the correct isomer for molecules which had the same score. Results using ACD's automatic peak-picking procedure are shown (ACD_auto) as well as results using manually peak-picked spectra (ACD_manual). Note that using ACD's automatic peak picking results in only a mild degradation in performance.
Fig. 5
Fig. 5. SCC curve based on the 42 molecules in the test set for high-level IR, DP4*, ACD and the combinations of DP4* and ACD with IR. The combination lines are obtained using the percentile-rank procedure described in the text. Corresponding SCC curves for combination with low-level IR are shown in the SI (Fig. S5).

References

    1. Bifulco G. Dambruoso P. Gomez-Paloma L. Riccio R. Chem. Rev. 2007;107:3744–3779. doi: 10.1021/cr030733c. - DOI - PubMed
    1. Keyes P. Hernandez G. Cianchetta G. Robinson J. Lefebvre B. Magn. Reson. Chem. 2009;47:38–52. doi: 10.1002/mrc.2347. - DOI - PubMed
    1. Golotvin S. S. Pol R. Sasaki R. R. Nikitina A. Keyes P. Magn. Reson. Chem. 2012;50:429–435. doi: 10.1002/mrc.3818. - DOI - PubMed
    1. Burns D. C. Mazzola E. P. Reynolds W. F. Nat. Prod. Rep. 2019;36:919–933. doi: 10.1039/C9NP00007K. - DOI - PubMed
    1. Buevich A. V. Elyashberg M. E. J. Nat. Prod. 2016;79:3105–3116. doi: 10.1021/acs.jnatprod.6b00799. - DOI - PubMed

LinkOut - more resources