Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Dec 2;142(48):20273-20287.
doi: 10.1021/jacs.0c09105. Epub 2020 Nov 10.

The Role of Machine Learning in the Understanding and Design of Materials

Affiliations
Review

The Role of Machine Learning in the Understanding and Design of Materials

Seyed Mohamad Moosavi et al. J Am Chem Soc. .

Abstract

Developing algorithmic approaches for the rational design and discovery of materials can enable us to systematically find novel materials, which can have huge technological and social impact. However, such rational design requires a holistic perspective over the full multistage design process, which involves exploring immense materials spaces, their properties, and process design and engineering as well as a techno-economic assessment. The complexity of exploring all of these options using conventional scientific approaches seems intractable. Instead, novel tools from the field of machine learning can potentially solve some of our challenges on the way to rational materials design. Here we review some of the chief advancements of these methods and their applications in rational materials design, followed by a discussion on some of the main challenges and opportunities we currently face together with our perspective on the future of rational materials design and discovery.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Algorithmic approach for holistic rational material design. We start with a problem for which we have conceptualized a solution (e.g., an adsorption process or device) with some requirements. For this concept, we try to find the best materials in the materials search space to maximize the performance indicators. Because of the complexity of performance evaluation, we usually select surrogate parameters (e.g., material properties) that we hope are reasonable surrogates for the performance in the real world.
Figure 2
Figure 2
Maps of the pore geometry of MOFs. The descriptors of pore geometry of MOFs were mapped to two dimensions using the t-distributed stochastic neighbor embedding (t-SNE) method. The t-SNE method preserves local similarities such that materials similar to each other are located close to each other in the 2D map. Each dot shows one material, and the structures from different databases are overlaid on top of the collective map structures from all databases. The experimental structures are from the CoRE-2019 database, and the hypothetical structures are from the ToBaCCo and BW-DB databases. From ref (41). CC BY 4.0.
Figure 3
Figure 3
Prediction of new materials for thermoelectric applications using data mining of the literature. (a) Materials that are found close to the word “thermoelectric” in the word-embedding space. (b) The power factors of the materials were computed using density functional theory, resulting in the discovery of many new potential materials for the thermoelectric applications. (c) Connecting words between the newly discovered materials and the word “thermoelectric”. The figure was redrawn based on ref (50).
Figure 4
Figure 4
Boltzmann generators. An invertible neural network is used to generate independent samples that follow the desired Boltzmann distribution of a molecular system. First, a sample point is chosen from a simple distribution pz(z), e.g., a Gaussian distribution. Then the neural network transforms this sample to a configuration x that follows px(x), which is a Boltzmann distribution similar to the one of the system. Lastly, to compute the thermodynamic properties, the samples are reweighted to their Boltzmann weight. The figure was redrawn based on refs (92) and (93).
Figure 5
Figure 5
Prediction of battery life cycle from early stages. (a) The cycle life is shown with respect to cell capacity at cycle 100. (b, c) Characteristics of the voltage curves of the first cycles were used as features to develop the machine learning models. Q100Q10 is change in discharge capacity between cycle 10 and 100. (d) Predictions of the machine learning model for two test sets. The secondary set was generated after model development. The vertical dashed line shows the 100th cycle, where the predictions were made. The figure was redrawn based on data from ref (102).
Figure 6
Figure 6
Methods for exploring chemical space. (a) Genetic algorithms use genetic operations to generate new samples that can quickly be evaluated by a machine learning model to maximize the fitness score. (b) Variational autoencoders (VAEs) learn a continuous lower-dimensional representation (the latent space) that can be used for gradient-based optimization of properties and recover the optimal chemicals by decoder. (c) Reinforcement-learning-based approach that incorporates Monte Carlo tree search (MCTS) to complete SMILES strings to generate new molecules, maximizing a reward function. (d) In a generative adversarial model, the generator and discriminator compete until the discriminator cannot distinguish generated samples from real empirical samples. By generating new samples, one can explore chemical space to maximize the properties of interest. The figure was redrawn based on ref (11).
Figure 7
Figure 7
A mobile robotic chemist. The robot was used to perform an autonomous search to find a photocatalyst for hydrogen production from water. The robot improved the photocatalytic activity of the initial formulations (indicated by the baseline) by a factor of 6 over 8 days of searching the experimental space, performing 688 experiments. The photograph of the robot was provided by Andrew I. Cooper and Benjamin Burger (University of Liverpool). The figure was redrawn based on ref (137).

References

    1. Yaghi O. M.; Kalmutzki M. J.; Diercks C. S.. Introduction to Reticular Chemistry: Metal–Organic Frameworks and Covalent Organic Frameworks; John Wiley & Sons, 2019.
    1. Allcock H. R. Rational design and synthesis of new polymeric material. Science 1992, 255, 1106–1112. 10.1126/science.255.5048.1106. - DOI - PubMed
    1. Jones M. R.; Seeman N. C.; Mirkin C. A. Programmable Materials and the Nature of the DNA Bond. Science 2015, 347, 1260901.10.1126/science.1260901. - DOI - PubMed
    1. Hastie T.; Tibshirani R.; Friedman J.. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer Series in Statistics; Springer: New York, 2001; Vol. 1.
    1. Machine Learning Meets Quantum Physics; Schütt K. T., Chmiela S., von Lilienfeld O. A., Tkatchenko A., Tsuda K., Müller K.-R., Eds.; Springer, 2020.