Review

. 2024 Dec 9;16(6):2514-2572.

doi: 10.1039/d4sc03921a. eCollection 2025 Feb 5.

A review of large language models and autonomous agents in chemistry

Mayk Caldas Ramos^{1

2}, Christopher J Collison³, Andrew D White^{1

2}

Affiliations

¹ FutureHouse Inc. San Francisco CA USA andrew@futurehouse.org.
² Department of Chemical Engineering, University of Rochester Rochester NY USA mcaldasr@ur.rochester.edu.
³ School of Chemistry and Materials Science, Rochester Institute of Technology Rochester NY USA cjcscha@rit.edu.

PMID: 39829984
PMCID: PMC11739813
DOI: 10.1039/d4sc03921a

Review

A review of large language models and autonomous agents in chemistry

Mayk Caldas Ramos et al. Chem Sci. 2024.

. 2024 Dec 9;16(6):2514-2572.

doi: 10.1039/d4sc03921a. eCollection 2025 Feb 5.

Authors

Mayk Caldas Ramos^{1

2}, Christopher J Collison³, Andrew D White^{1

2}

Affiliations

¹ FutureHouse Inc. San Francisco CA USA andrew@futurehouse.org.
² Department of Chemical Engineering, University of Rochester Rochester NY USA mcaldasr@ur.rochester.edu.
³ School of Chemistry and Materials Science, Rochester Institute of Technology Rochester NY USA cjcscha@rit.edu.

PMID: 39829984
PMCID: PMC11739813
DOI: 10.1039/d4sc03921a

Abstract

Large language models (LLMs) have emerged as powerful tools in chemistry, significantly impacting molecule design, property prediction, and synthesis optimization. This review highlights LLM capabilities in these domains and their potential to accelerate scientific discovery through automation. We also review LLM-based autonomous agents: LLMs with a broader set of tools to interact with their surrounding environment. These agents perform diverse tasks such as paper scraping, interfacing with automated laboratories, and synthesis planning. As agents are an emerging topic, we extend the scope of our review of agents beyond chemistry and discuss across any scientific domains. This review covers the recent history, current capabilities, and design of LLMs and autonomous agents, addressing specific challenges, opportunities, and future directions in chemistry. Key challenges include data quality and integration, model interpretability, and the need for standard benchmarks, while future directions point towards more sophisticated multi-modal agents and enhanced collaboration between agents and experimental methods. Due to the quick pace of this field, a repository has been built to keep track of the latest studies: https://github.com/ur-whitelab/LLMs-in-science.

This journal is © The Royal Society of Chemistry.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflicts to declare.

Figures

Fig. 1. AI-powered LLMs accelerate chemical discovery with models that address key challenges in property prediction, property directed molecule generation, and synthesis prediction. Autonomous agents connect these models and additional tools thereby enabling rapid exploration of vast chemical spaces.

Fig. 2. (a) The generalized encoder–decoder transformer: the encoder on the left converts an input into a vector, while the decoder on the right predicts the next token in a sequence. (b) Encoder–decoder transformers are traditionally used for translation tasks and, in chemistry, for reaction prediction, translating reactants into products. (c) Encoder-only transformers provide a vector output and are typically used for sentiment analysis. In chemistry, they are used for property prediction or classification tasks. (d) Decoder-only transformers generate likely next tokens in a sequence. In chemistry, they are used to generate new molecules given an instruction and description of molecules.

**Fig. 3. Classification of LLMs in chemistry and biochemistry according to their application.**

**Fig. 4. Illustration of how Large Language Models (LLMs) evolved chronologically. The dates display the first publication of each model.**

Fig. 5. Number of training tokens (on log scale) available from various chemical sources compared with typical LLM training runs. The numbers are drawn from ZINC, PubChem, Touvron *et al.*, ChEMBL, and Kinney *et al.*

Fig. 6. Agent's architecture as defined in this review. According to our definition, an agent is composed of a central program (typically an LLM and the code to implement the agent's dynamic behavior) and the agent modules. The agent continuously receives observations from the environment and decides which action should be executed to complete the task given to it. Here, we define the agent as the set of elements whose decision is trainable, that is, the LLM, the agent code, the decision process, and the agent modules. Given a task, the agent uses the agent modules (memory, reasoning, planning, profiling) and the LLM to decide which action should be executed. This action is executed by calling a tool from the environment. After the action is executed, an observation is produced and fed back to the agent. The agent can use perception to receive inputs in different modalities from the environment. (A) Description of agent modules, (B) illustration of the agent architecture, (C) illustration of the environment components, (D) description of tools elements present in the environment.

See this image and copyright information in PMC

References

1. Willett P. Chemoinformatics: a history. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2011;1(1):46–56.
1. Griffen E. J. Dossetter A. G. Leach A. G. Chemists: AI Is Here; Unite To Get the Benefits. J. Med. Chem. 2020;63(16):8695–8704. doi: 10.1021/acs.jmedchem.0c00163. - DOI - PubMed
1. Baum Z. J. Xiang Yu. Ayala P. Y. Zhao Y. Watkins S. P. Zhou Q. Artificial Intelligence in Chemistry: Current Trends and Future Directions. J. Chem. Inf. Model. 2021;61(7):3197–3212. doi: 10.1021/acs.jcim.1c00619. - DOI - PubMed
1. Ayres L. B. Gomez F. J. V. Linton J. R. Silva M. F. Garcia C. D. Taking the leap between analytical chemistry and artificial intelligence: A tutorial review. Anal. Chim. Acta. 2021;1161:338403. doi: 10.1016/j.aca.2021.338403. - DOI - PubMed
1. Yang X. Wang Y. Byrne R. Schneider G. Yang S. Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. Chem. Rev. 2019;119(18):10520–10594. doi: 10.1021/acs.chemrev.8b00728. - DOI - PubMed

Publication types

Actions

LinkOut - more resources

Full Text Sources
- PubMed Central
- Royal Society of Chemistry
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A review of large language models and autonomous agents in chemistry

Affiliations

A review of large language models and autonomous agents in chemistry

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

LinkOut - more resources

Full Text Sources

Research Materials