Review

. 2023 Aug 8;2(5):1233-1250.

doi: 10.1039/d3dd00113j. eCollection 2023 Oct 9.

14 examples of how LLMs can transform materials science and chemistry: a reflection on a large language model hackathon

Kevin Maik Jablonka¹, Qianxiang Ai², Alexander Al-Feghali³, Shruti Badhwar⁴, Joshua D Bocarsly⁵, Andres M Bran^{6

7}, Stefan Bringuier⁸, L Catherine Brinson⁹, Kamal Choudhary¹⁰, Defne Circi⁹, Sam Cox¹¹, Wibe A de Jong¹², Matthew L Evans^{13

14}, Nicolas Gastellu³, Jerome Genzling³, María Victoria Gil¹⁵, Ankur K Gupta¹², Zhi Hong¹⁶, Alishba Imran¹⁷, Sabine Kruschwitz¹⁸, Anne Labarre³, Jakub Lála¹⁹, Tao Liu³, Steven Ma³, Sauradeep Majumdar¹, Garrett W Merz²⁰, Nicolas Moitessier³, Elias Moubarak¹, Beatriz Mouriño¹, Brenden Pelkie²¹, Michael Pieler^{22

23}, Mayk Caldas Ramos¹¹, Bojana Ranković^{6

7}, Samuel G Rodriques¹⁹, Jacob N Sanders²⁴, Philippe Schwaller^{6

7}, Marcus Schwarting²⁵, Jiale Shi², Berend Smit¹, Ben E Smith⁵, Joren Van Herck¹, Christoph Völker¹⁸, Logan Ward²⁶, Sean Warren³, Benjamin Weiser³, Sylvester Zhang³, Xiaoqi Zhang¹, Ghezal Ahmad Zia¹⁸, Aristana Scourtas²⁷, K J Schmidt²⁷, Ian Foster²⁸, Andrew D White¹¹, Ben Blaiszik²⁷

Affiliations

¹ Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Sion Valais Switzerland mail@kjablonka.com.
² Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA.
³ Department of Chemistry, McGill University Montreal Quebec Canada.
⁴ Reincarnate Inc. USA.
⁵ Yusuf Hamied Department of Chemistry, University of Cambridge Lensfield Road Cambridge CB2 1EW UK.
⁶ Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland.
⁷ National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland.
⁸ Independent Researcher San Diego CA USA.
⁹ Mechanical Engineering and Materials Science, Duke University USA.
¹⁰ Material Measurement Laboratory, National Institute of Standards and Technology Maryland 20899 USA.
¹¹ Department of Chemical Engineering, University of Rochester USA.
¹² Applied Mathematics and Computational Research Division, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA.
¹³ Institut de la Matière Condensée et des Nanosciences (IMCN), UCLouvain Chemin des Étoiles 8 Louvain-la-Neuve 1348 Belgium.
¹⁴ Matgenix SRL 185 Rue Armand Bury 6534 Gozée Belgium.
¹⁵ Instituto de Ciencia y Tecnología del Carbono (INCAR), CSIC Francisco Pintado Fe 26 33011 Oviedo Spain.
¹⁶ Department of Computer Science, University of Chicago Chicago Illinois 60637 USA.
¹⁷ Computer Science, University of California Berkeley CA 94704 USA.
¹⁸ Bundesanstalt für Materialforschung und -prüfung Unter den Eichen 87 12205 Berlin Germany.
¹⁹ Francis Crick Institute 1 Midland Rd London NW1 1AT UK.
²⁰ American Family Insurance Data Science Institute, University of Wisconsin-Madison Madison WI 53706 USA.
²¹ Department of Chemical Engineering, University of Washington Seattle WA 98105 USA.
²² OpenBioML.org UK.
²³ Stability.AI UK.
²⁴ Department of Chemistry and Biochemistry, University of California Los Angeles CA 90095 USA.
²⁵ Department of Computer Science, University of Chicago Chicago IL 60490 USA.
²⁶ Data Science and Learning Division, Argonne National Lab USA.
²⁷ Globus, University of Chicago, Data Science and Learning Division, Argonne National Lab USA blaiszik@uchicago.edu.
²⁸ Department of Computer Science, University of Chicago, Data Science and Learning Division, Argonne National Lab USA.

PMID: 38013906
PMCID: PMC10561547
DOI: 10.1039/d3dd00113j

Review

14 examples of how LLMs can transform materials science and chemistry: a reflection on a large language model hackathon

Kevin Maik Jablonka et al. Digit Discov. 2023.

. 2023 Aug 8;2(5):1233-1250.

doi: 10.1039/d3dd00113j. eCollection 2023 Oct 9.

Authors

Affiliations

¹ Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Sion Valais Switzerland mail@kjablonka.com.
² Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA.
³ Department of Chemistry, McGill University Montreal Quebec Canada.
⁴ Reincarnate Inc. USA.
⁵ Yusuf Hamied Department of Chemistry, University of Cambridge Lensfield Road Cambridge CB2 1EW UK.
⁶ Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland.
⁷ National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland.
⁸ Independent Researcher San Diego CA USA.
⁹ Mechanical Engineering and Materials Science, Duke University USA.
¹⁰ Material Measurement Laboratory, National Institute of Standards and Technology Maryland 20899 USA.
¹¹ Department of Chemical Engineering, University of Rochester USA.
¹² Applied Mathematics and Computational Research Division, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA.
¹³ Institut de la Matière Condensée et des Nanosciences (IMCN), UCLouvain Chemin des Étoiles 8 Louvain-la-Neuve 1348 Belgium.
¹⁴ Matgenix SRL 185 Rue Armand Bury 6534 Gozée Belgium.
¹⁵ Instituto de Ciencia y Tecnología del Carbono (INCAR), CSIC Francisco Pintado Fe 26 33011 Oviedo Spain.
¹⁶ Department of Computer Science, University of Chicago Chicago Illinois 60637 USA.
¹⁷ Computer Science, University of California Berkeley CA 94704 USA.
¹⁸ Bundesanstalt für Materialforschung und -prüfung Unter den Eichen 87 12205 Berlin Germany.
¹⁹ Francis Crick Institute 1 Midland Rd London NW1 1AT UK.
²⁰ American Family Insurance Data Science Institute, University of Wisconsin-Madison Madison WI 53706 USA.
²¹ Department of Chemical Engineering, University of Washington Seattle WA 98105 USA.
²² OpenBioML.org UK.
²³ Stability.AI UK.
²⁴ Department of Chemistry and Biochemistry, University of California Los Angeles CA 90095 USA.
²⁵ Department of Computer Science, University of Chicago Chicago IL 60490 USA.
²⁶ Data Science and Learning Division, Argonne National Lab USA.
²⁷ Globus, University of Chicago, Data Science and Learning Division, Argonne National Lab USA blaiszik@uchicago.edu.
²⁸ Department of Computer Science, University of Chicago, Data Science and Learning Division, Argonne National Lab USA.

PMID: 38013906
PMCID: PMC10561547
DOI: 10.1039/d3dd00113j

Abstract

Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of molecules and materials, designing novel interfaces for tools, extracting knowledge from unstructured data, and developing new educational applications. The diverse topics and the fact that working prototypes could be generated in less than two days highlight that LLMs will profoundly impact the future of our fields. The rich collection of ideas and projects also indicates that the applications of LLMs are not limited to materials science and chemistry but offer potential benefits to a wide range of scientific disciplines.

This journal is © The Royal Society of Chemistry.

PubMed Disclaimer

Conflict of interest statement

There are no conflicts to declare.

Figures

Fig. 1. Using LLMs to predict the compressive strength of concretes. An illustration of the conventional approach for solving this task, *i.e.*, training classical prediction models using ten training data points as tabular data (left). Using the LIFT framework LLMs can also use tabular data and leverage context information provided in natural language (right). The context can be “fuzzy” design rules often known in chemistry and materials science but hard to incorporate in conventional ML models. Augmented with this context and ten training examples, ICL with LLM leads to a performance that outperforms baselines such as RFs or GPR.

Fig. 2. GA using an LLM. This figure illustrates how different aspects of a GA can be performed by an LLM. GPT-3.5 was used to fragment, reproduce, and optimize molecules represented by SMILES strings. The first column illustrated how an LLM can fragment a molecule represented by a SMILES string (input molecule on top, output LLM fragments below). The middle column showcases how an LLM can reproduce/mix two molecules as is done in a GA (input molecule on top, output LLM below). The right column illustrates an application in which an LLM is used to optimize molecules given their SMILES and an associated score. The LLM suggested potential modifications to optimize molecules. The plot shows best (blue) and mean (orange) Tanimoto similarity to vitamin C per LLM produced generations.

Fig. 3. Schematic overview of the MAPI-LLM workflow. It uses LLMs to process the user's input and decide which available tools (*e.g.*, Materials Project API, the Reaction-Network package, and Google Search) to use following an iterative chain-of-thought procedure. In this way, it can answer questions such as “Is the material AnByCz stable?”.

Fig. 4. The sMolTalk interface. Based on few-shot prompting LLMs can create code for visualization tools such as that can create custom visualization based on a natural-language description of the desired output. The top left box is the input field where users can enter commands in natural language. The top right box prints the code the LLM generates. This code generates the visualization shown in the lower box. In this example, the user entered a sequence of four commands: the LLM (1) generates code for retrieving the structure, (2) colors the carbons blue, (3) displays the hydrogens as red spheres, and (4) reduces the size of the spheres.

Fig. 5. Using an LLM as an interface to an ELN/data management system/data management system. LLM-based assistants can provide powerful interfaces to digital experimental data. The figure shows a screenshot of a conversation with in the data management system (https://github.com/the-grey-group/datalab). Here, is provided with data from the JSON API of of an experimental battery cell. The user then prompts (green box) the system to build a flowchart of the provenance of the sample. The assistant responds with markdown code, which the interface automatically recognizes and translates into a visualization.

Fig. 6. Schematic overview of BoLLama. An LLM can act as an interface to a BO algorithm. An experimental chemist can bootstrap an optimization and then, *via* a chat interface, update the state of the simulation to which the bot responds with the recommended next steps.

**Fig. 7. The InsightGraph interface. A suitably prompted LLM can create knowledge graph representations of scientific text that can be visualized using tools such as neo4j's visualization tools.**

Fig. 8. The organic synthesis parser interface. The top box shows text describing an organic reaction (https://open-reaction-database.org/client/id/ord-1f99b308e17340cb8e0e3080c270fd08), which the finetuned LLM converts into structured JSON (bottom). A demo application can be found at https://qai222.github.io/LLM_organic_synthesis/.

Fig. 9. TableToJson. Results of the structured JSON generation of tables contained in scientific articles. Two approaches are compared: (i) the use of an OpenAI model prompted with the desired JSON schema, and (ii) the use of an OpenAI model together with In both cases, JSON objects were always obtained. The output of the OpenAI model did not always follow the provided schema, although this might be solved by modifying the schema. The accuracy of the results from the approach used with OpenAI models could be increased (as shown by the blue arrows) by solving errors in the generation of power numbers and special characters with a more detailed prompt. The results can be visualized in this demo app: https://vgvinter-tabletojson-app-kt5aiv.streamlit.app/.

Fig. 10. The I-digest interface. (a) A video (*e.g.*, of a lecture recording) can be described using the Whisper model. Based on the transcript, an LLM can generate questions (and answers). Those can assist students in their learning. (b) The LLM can also detect mentions of chemicals and link to further information about them (*e.g.*, on PubChem).

See this image and copyright information in PMC

References

1. Butler K. T. Davies D. W. Cartwright H. Isayev O. Walsh A. Machine learning for molecular and materials science. Nature. 2018;559:547–555. doi: 10.1038/s41586-018-0337-2. - DOI - PubMed
1. Moosavi S. M. Jablonka K. M. Smit B. The Role of Machine Learning in the Understanding and Design of Materials. J. Am. Chem. Soc. 2020;142:20273–20287. doi: 10.1021/jacs.0c09105. - DOI - PMC - PubMed
1. Morgan D. Jacobs R. Opportunities and Challenges for Machine Learning in Materials Science. Annu. Rev. Mater. Res. 2020;50:71–103. doi: 10.1146/annurev-matsci-070218-010015. - DOI
1. Ramprasad R. Batra R. Pilania G. Mannodi-Kanakkithodi A. Kim C. Machine learning in materials informatics: recent applications and prospects. npj Comput. Mater. 2017;3:54. doi: 10.1038/s41524-017-0056-5. - DOI
1. Schmidt J. Marques M. R. G. Botti S. Marques M. A. L. Recent advances and applications of machine learning in solid-state materials science. npj Comput. Mater. 2019;5:83. doi: 10.1038/s41524-019-0221-0. - DOI

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

14 examples of how LLMs can transform materials science and chemistry: a reflection on a large language model hackathon

Affiliations

14 examples of how LLMs can transform materials science and chemistry: a reflection on a large language model hackathon

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources