Negative chemical data boosts language models in reaction outcome prediction
- PMID: 40512839
- PMCID: PMC12164950
- DOI: 10.1126/sciadv.adt5578
Negative chemical data boosts language models in reaction outcome prediction
Abstract
Trial-and-error approaches in chemistry generate abundant unsuccessful experiments, yet the potential of these so-called negative results remains largely underutilized. Here, we demonstrate that information from negative chemical reactions can be leveraged to improve reactivity-prediction models, offering advantages in scenarios with a limited volume of successful data. We extend the tuning of language models with reinforcement learning to the chemistry domain, training a transformer model for chemical reaction prediction. Our approach is evaluated using both a rigorously controlled dataset and a realistic high-throughput dataset comprising extensive reaction screenings across diverse catalysts sets and experimental conditions. The model achieves state-of-the-art performance by leveraging information from as few as 20 positive data points in the controlled dataset, supported by a negative dataset at least 40 times larger. Consistent results on both datasets demonstrate that, with an appropriate optimization strategy and the inclusion of unsuccessful experimental data, models can be effectively trained even when successful reactions are underrepresented.
Figures
References
-
- Maloney M. O., Coley C. W., Genheden S., Carson N., Helquist P., Norrby P.-O., Wiest O., Negative data in data sets for machine learning training. Org. Lett. 25, 2945–2947 (2023). - PubMed
-
- Raccuglia P., Elbert K., Adler P. D. F., Falk C., Wenny M. B., Mollo A., Zeller M., Friedler S. A., Schrier J., Norquist A., Machine-learning-assisted materials discovery using failed experiments. Nature 533, 73–76 (2016). - PubMed
-
- Angello N. H., Rathore V., Beker W., Wołos A., Jira E. R., Roszak R., Wu T. C., Schroeder C. M., Aspuru-Guzik A., Grzybowski B. A., Burke M. D., Closed-loop optimization of general reaction conditions for heteroaryl Suzuki-Miyaura coupling. Science 378, 399–405 (2022). - PubMed
-
- Buitrago Santanilla A., Regalado E. L., Pereira T., Shevlin M., Bateman K., Campeau L. C., Schneeweis J., Berritt S., Shi Z., Nantermet P., Liu Y., Helmy R., Welch C. J., Vachal P., Davies J. W., Cernak T., Dreher S. D., Nanomole-scale high-throughput chemistry for the synthesis of complex molecules. Science 347, 49–53 (2015). - PubMed
LinkOut - more resources
Full Text Sources
