Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr;16(4):633-643.
doi: 10.1038/s41557-023-01393-w. Epub 2024 Jan 2.

Probing the chemical 'reactome' with high-throughput experimentation data

Affiliations

Probing the chemical 'reactome' with high-throughput experimentation data

Emma King-Smith et al. Nat Chem. 2024 Apr.

Abstract

High-throughput experimentation (HTE) has the potential to improve our understanding of organic chemistry by systematically interrogating reactivity across diverse chemical spaces. Notable bottlenecks include few publicly available large-scale datasets and the need for facile interpretation of these data's hidden chemical insights. Here we report the development of a high-throughput experimentation analyser, a robust and statistically rigorous framework, which is applicable to any HTE dataset regardless of size, scope or target reaction outcome, which yields interpretable correlations between starting material(s), reagents and outcomes. We improve the HTE data landscape with the disclosure of 39,000+ previously proprietary HTE reactions that cover a breadth of chemistry, including cross-coupling reactions and chiral salt resolutions. The high-throughput experimentation analyser was validated on cross-coupling and hydrogenation datasets, showcasing the elucidation of statistically significant hidden relationships between reaction components and outcomes, as well as highlighting areas of dataset bias and the specific reaction spaces that necessitate further investigation.

PubMed Disclaimer

Conflict of interest statement

A.A.L. is a co-founder and owns equity in PostEra Inc and Byterat Ltd. S.B., L.B., X.H., R.M.H., J.L.K., J.M., N.W.S., J.W.T. and Q.Y. are employed by Pfizer Inc. E.K.S. declares no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the HTE dataset and framework.
a, Overview of HiTEA and its analysis is shown. Comparison of the literature reactome with HiTEA reactome will reveal support for our mechanistic conclusions (agreement of reactomes) or reveal areas of bias/unusual chemical phenomenon (disagreement of reactomes). b, Abstracted representations of the four reaction classes analysed by HiTEA in this publication are shown. c, Breakdown of the HTE dataset by reaction class is shown.
Fig. 2
Fig. 2. Unique reacting pairs/molecules for each reaction class.
ad, Buchwald–Hartwig dataset (a), Ullmann dataset (b), heterogeneous hydrogenation dataset (c) and homogeneous hydrogenation dataset (d) are shown. FG, functional group.
Fig. 3
Fig. 3. HiTEA analysis of the Buchwald–Hartwig dataset.
HiTEA/literature-specific variable importances agreement between the literature and HiTEA variable importances is highlighted. Acronym structures can be found in in Supplementary Fig. 11. Temp, reaction temperature. a, Variable importances are shown and unless otherwise specified, the metal source for the ligand is Pd(OAc)2. Where appropriate, reactant importances are shown. b, Statistically significant best-/worst-in-class catalysts and reagents are shown and unless otherwise specified, CuI is the copper source for the Ullmann couplings.
Fig. 4
Fig. 4. Visualization of the ligand space for all three subreactomes and in-depth analysis of Ullmann dataset.
a, PCA ligand analysis of the Buchwald, Ullmann and CO reduction ligands is shown. b, HiTEA variable importance analysis of Ullmann dataset is shown. Unless otherwise specified, CuI is the copper source. HiTEA/Literature-specific variable importances agreement between the literature and HiTEA variable importances is highlighted. Acronym structures can be found in Supplementary Fig. 11.
Fig. 5
Fig. 5. HiTEA variable importance analysis on heterogeneous and homogeneous hydrogenation datasets.
Where appropriate, reactant importances are shown. HiTEA-specific variable importances are highlighted, as well as agreement between the literature and HiTEA variable importances. Acronym structures can be found in Supplementary Fig. 11.
Fig. 6
Fig. 6. HiTEA best-/worst-in-class analysis of hydrogenation dataset.
Acronym structures can be found in Supplementary Fig. 11. a, Heterogeneous hydrogenation dataset is shown. b, Homogeneous hydrogenation dataset is shown.

Similar articles

Cited by

References

    1. Ahneman DT, Estrada JG, Lin S, Dreher SD, Doyle AG. Predicting reaction performance in C–N cross-coupling using machine learning. Science. 2018;360:186–190. doi: 10.1126/science.aar5169. - DOI - PubMed
    1. Nielsen MK, Ahneman DT, Riera O, Doyle AG. Deoxyfluorination with sulfonyl fluorides: navigating reaction space with machine learning. J. Am. Chem. Soc. 2018;140:5004–5008. doi: 10.1021/jacs.8b01523. - DOI - PubMed
    1. Reid JP, Sigman MS. Holistic prediction of enantioselectivity in asymmetric catalysis. Nature. 2019;571:343–348. doi: 10.1038/s41586-019-1384-z. - DOI - PMC - PubMed
    1. Santiago CB, Guo J-Y, Sigman MS. Predictive and mechanistic multivariate linear regression models for reaction development. Chem. Sci. 2018;9:2398–2412. doi: 10.1039/C7SC04679K. - DOI - PMC - PubMed
    1. Mennen SM, et al. The evolution of high-throughput experimentation in pharmaceutical development and perspectives on the future. Org. Process Res. Dev. 2019;23:1213–1242. doi: 10.1021/acs.oprd.9b00140. - DOI