Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Aug 8;2(5):1233-1250.
doi: 10.1039/d3dd00113j. eCollection 2023 Oct 9.

14 examples of how LLMs can transform materials science and chemistry: a reflection on a large language model hackathon

Affiliations
Review

14 examples of how LLMs can transform materials science and chemistry: a reflection on a large language model hackathon

Kevin Maik Jablonka et al. Digit Discov. .

Abstract

Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of molecules and materials, designing novel interfaces for tools, extracting knowledge from unstructured data, and developing new educational applications. The diverse topics and the fact that working prototypes could be generated in less than two days highlight that LLMs will profoundly impact the future of our fields. The rich collection of ideas and projects also indicates that the applications of LLMs are not limited to materials science and chemistry but offer potential benefits to a wide range of scientific disciplines.

PubMed Disclaimer

Conflict of interest statement

There are no conflicts to declare.

Figures

Fig. 1
Fig. 1. Using LLMs to predict the compressive strength of concretes. An illustration of the conventional approach for solving this task, i.e., training classical prediction models using ten training data points as tabular data (left). Using the LIFT framework LLMs can also use tabular data and leverage context information provided in natural language (right). The context can be “fuzzy” design rules often known in chemistry and materials science but hard to incorporate in conventional ML models. Augmented with this context and ten training examples, ICL with LLM leads to a performance that outperforms baselines such as RFs or GPR.
Fig. 2
Fig. 2. GA using an LLM. This figure illustrates how different aspects of a GA can be performed by an LLM. GPT-3.5 was used to fragment, reproduce, and optimize molecules represented by SMILES strings. The first column illustrated how an LLM can fragment a molecule represented by a SMILES string (input molecule on top, output LLM fragments below). The middle column showcases how an LLM can reproduce/mix two molecules as is done in a GA (input molecule on top, output LLM below). The right column illustrates an application in which an LLM is used to optimize molecules given their SMILES and an associated score. The LLM suggested potential modifications to optimize molecules. The plot shows best (blue) and mean (orange) Tanimoto similarity to vitamin C per LLM produced generations.
Fig. 3
Fig. 3. Schematic overview of the MAPI-LLM workflow. It uses LLMs to process the user's input and decide which available tools (e.g., Materials Project API, the Reaction-Network package, and Google Search) to use following an iterative chain-of-thought procedure. In this way, it can answer questions such as “Is the material AnByCz stable?”.
Fig. 4
Fig. 4. The sMolTalk interface. Based on few-shot prompting LLMs can create code for visualization tools such as that can create custom visualization based on a natural-language description of the desired output. The top left box is the input field where users can enter commands in natural language. The top right box prints the code the LLM generates. This code generates the visualization shown in the lower box. In this example, the user entered a sequence of four commands: the LLM (1) generates code for retrieving the structure, (2) colors the carbons blue, (3) displays the hydrogens as red spheres, and (4) reduces the size of the spheres.
Fig. 5
Fig. 5. Using an LLM as an interface to an ELN/data management system/data management system. LLM-based assistants can provide powerful interfaces to digital experimental data. The figure shows a screenshot of a conversation with in the data management system (https://github.com/the-grey-group/datalab). Here, is provided with data from the JSON API of of an experimental battery cell. The user then prompts (green box) the system to build a flowchart of the provenance of the sample. The assistant responds with markdown code, which the interface automatically recognizes and translates into a visualization.
Fig. 6
Fig. 6. Schematic overview of BoLLama. An LLM can act as an interface to a BO algorithm. An experimental chemist can bootstrap an optimization and then, via a chat interface, update the state of the simulation to which the bot responds with the recommended next steps.
Fig. 7
Fig. 7. The InsightGraph interface. A suitably prompted LLM can create knowledge graph representations of scientific text that can be visualized using tools such as neo4j's visualization tools.
Fig. 8
Fig. 8. The organic synthesis parser interface. The top box shows text describing an organic reaction (https://open-reaction-database.org/client/id/ord-1f99b308e17340cb8e0e3080c270fd08), which the finetuned LLM converts into structured JSON (bottom). A demo application can be found at https://qai222.github.io/LLM_organic_synthesis/.
Fig. 9
Fig. 9. TableToJson. Results of the structured JSON generation of tables contained in scientific articles. Two approaches are compared: (i) the use of an OpenAI model prompted with the desired JSON schema, and (ii) the use of an OpenAI model together with In both cases, JSON objects were always obtained. The output of the OpenAI model did not always follow the provided schema, although this might be solved by modifying the schema. The accuracy of the results from the approach used with OpenAI models could be increased (as shown by the blue arrows) by solving errors in the generation of power numbers and special characters with a more detailed prompt. The results can be visualized in this demo app: https://vgvinter-tabletojson-app-kt5aiv.streamlit.app/.
Fig. 10
Fig. 10. The I-digest interface. (a) A video (e.g., of a lecture recording) can be described using the Whisper model. Based on the transcript, an LLM can generate questions (and answers). Those can assist students in their learning. (b) The LLM can also detect mentions of chemicals and link to further information about them (e.g., on PubChem).

References

    1. Butler K. T. Davies D. W. Cartwright H. Isayev O. Walsh A. Machine learning for molecular and materials science. Nature. 2018;559:547–555. doi: 10.1038/s41586-018-0337-2. - DOI - PubMed
    1. Moosavi S. M. Jablonka K. M. Smit B. The Role of Machine Learning in the Understanding and Design of Materials. J. Am. Chem. Soc. 2020;142:20273–20287. doi: 10.1021/jacs.0c09105. - DOI - PMC - PubMed
    1. Morgan D. Jacobs R. Opportunities and Challenges for Machine Learning in Materials Science. Annu. Rev. Mater. Res. 2020;50:71–103. doi: 10.1146/annurev-matsci-070218-010015. - DOI
    1. Ramprasad R. Batra R. Pilania G. Mannodi-Kanakkithodi A. Kim C. Machine learning in materials informatics: recent applications and prospects. npj Comput. Mater. 2017;3:54. doi: 10.1038/s41524-017-0056-5. - DOI
    1. Schmidt J. Marques M. R. G. Botti S. Marques M. A. L. Recent advances and applications of machine learning in solid-state materials science. npj Comput. Mater. 2019;5:83. doi: 10.1038/s41524-019-0221-0. - DOI