Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 22;6(4):513-524.
doi: 10.1021/acscentsci.0c00026. Epub 2020 Mar 11.

Accurate Multiobjective Design in a Space of Millions of Transition Metal Complexes with Neural-Network-Driven Efficient Global Optimization

Affiliations

Accurate Multiobjective Design in a Space of Millions of Transition Metal Complexes with Neural-Network-Driven Efficient Global Optimization

Jon Paul Janet et al. ACS Cent Sci. .

Abstract

The accelerated discovery of materials for real world applications requires the achievement of multiple design objectives. The multidimensional nature of the search necessitates exploration of multimillion compound libraries over which even density functional theory (DFT) screening is intractable. Machine learning (e.g., artificial neural network, ANN, or Gaussian process, GP) models for this task are limited by training data availability and predictive uncertainty quantification (UQ). We overcome such limitations by using efficient global optimization (EGO) with the multidimensional expected improvement (EI) criterion. EGO balances exploitation of a trained model with acquisition of new DFT data at the Pareto front, the region of chemical space that contains the optimal trade-off between multiple design criteria. We demonstrate this approach for the simultaneous optimization of redox potential and solubility in candidate M(II)/M(III) redox couples for redox flow batteries from a space of 2.8 M transition metal complexes designed for stability in practical redox flow battery (RFB) applications. We show that a multitask ANN with latent-distance-based UQ surpasses the generalization performance of a GP in this space. With this approach, ANN prediction and EI scoring of the full space are achieved in minutes. Starting from ca. 100 representative points, EGO improves both properties by over 3 standard deviations in only five generations. Analysis of lookahead errors confirms rapid ANN model improvement during the EGO process, achieving suitable accuracy for predictive design in the space of transition metal complexes. The ANN-driven EI approach achieves at least 500-fold acceleration over random search, identifying a Pareto-optimal design in around 5 weeks instead of 50 years.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Approach for hierarchical assembly of a 2.8 M homoleptic complex design space for multiobjective RFB design. The sequential stages that lead to each combinatorial step are indicated in the flowchart at left starting with selection and modification of rings, optionally fusing them to form 779 unique monodentate and bidentate ligands, functionalization of ligands with 897 unique functional groups, and finally complexation with four metals. At right, example skeleton structure components and their assembly or modification coinciding with each step are shown to form a bidentate oxygen-coordinating (coordinating atoms shown with red or blue highlight) ligand. The functional group assembly step is indicated in the black inset with the resulting structure shown. Finally a ball and stick structure is shown for an assembled complex.
Figure 2
Figure 2
Illustration of the active learning workflow used in this work to explore a 2.8 M-complex space. DFT simulations are performed on approximately 100 cluster medoids, which are used to iteratively train ML surrogate models. The surrogate models score all possible 2.8 M candidates using 2D expected improvement (EI), and the top scoring complexes are clustered to repeat the process. Inset: illustration of a Pareto set (blue points) and front (dashed line) for a 2D objective function. The distribution of property values at trial point ŷ(x) (yellow outlined circle) is shown, and the probability mass below the front is shown in yellow.
Figure 3
Figure 3
(Top, left) Generalization performance of multitask ANN (green bars), single task ANN (blue bars), and GP models (gray bars) for ΔGox(sol) (left) and logP (right). Train and test logP errors are also shown in an expanded inset. (Bottom, left) Distribution of properties between “hot start” (in red) and clustered representative (generation 1, in orange) data set. (Right) Principal component analysis in RAC-155 of the full design space (binned histogram, colored according to inset color bar), the original “hot start” data (red circles), and representative clusters (generation 1, orange circles).
Figure 4
Figure 4
ANN-predicted logP and ΔGox(sol) values for the 2.8 M-complex design space. The ANN has been trained on representative clusters and “hot start” data. The generation 1 data (blue diamonds) forming the Pareto front (blue line) are shown in both panes. The best (i.e., lowest) logP Fe complex (A) and highest ΔGox(sol) Mn complex (B) are labeled at left and shown at top. The points are colored by the probability of improving on the front (P[I], left) or expected improvement (E[I], right). On the basis of the approximation of the front used in the E[I], potential complexes equidistant between points on the front score highest. The 100 selected cluster medoids (yellow triangles) from the 10k top E[I]-scoring complexes selected for subsequent DFT calculations are indicated at right, with two examples, a Co complex (C) and Mn complex (D), shown at top right. All representative complexes are shown in both 3D sticks and with each ligand shown as a skeleton structure.
Figure 5
Figure 5
Mean absolute errors (MAEs) for ΔGox(sol) (top, in eV) and logP (bottom) predictions with a multitask ANN. Each bar is colored by the generation at which it is trained, as indicated in the top inset (here, generation 0 corresponds to “hot start” data only). Lookahead MAEs are reported on data sets (1–5, as indicated on axis) generated in each relevant subsequent generation. The MAEs on a separately collected random test set representative of the full design space (random, as indicated on axis) for all multitask ANNs are also reported.
Figure 6
Figure 6
(Left) ΔGox(sol) and logP values for complexes simulated during five generations of the design algorithm, colored by generation and with unique symbols for each metal center (as indicated in inset legend). The range of values sampled in each generation is indicated by a translucent convex hull, and the final Pareto front is indicated by a red line. Three complexes along this front are labeled and shown at top in both 3D sticks and with each ligand shown as a skeleton structure: the highest ΔGox(sol) Mn complex (A), the best trade-off Mn complex (B), and the highest logP Fe complex (C). (Right) Distribution of ΔGox(sol) (top) and logP (bottom) values for each generation (colors and symbols as in left pane) alongside a random sample (gray symbols). The mean value for each generation is indicated with a blue horizontal line.
Figure 7
Figure 7
Composition of the eight complexes in the final Pareto set. Each complex consists of one metal center with bidentate ligands assembled from one six-membered and one five-membered ring with the metal-coordinating atom corresponding to the left-most oxygen atom. In all cases, a functional group (indicated in rounded rectangles) is attached symmetrically to both heterocycles. Each complex is represented by a unique path from left (metal center) to right (functional group), and the path is colored by whether the complex has a relatively improved logP (low logP, yellow) or ΔGox(sol) (high ΔGox(sol), blue), as indicated in the inset color bar.

References

    1. Tabor D. P.; Roch L. M.; Saikin S. K.; Kreisbeck C.; Sheberla D.; Montoya J. H.; Dwaraknath S.; Aykol M.; Ortiz C.; Tribukait H.; Amador-Bedolla C.; Brabec C. J.; Maruyama B.; Persson K. A.; Aspuru-Guzik A. Accelerating the Discovery of Materials for Clean Energy in the Era of Smart Automation. Nat. Rev. Mater. 2018, 3, 5–20. 10.1038/s41578-018-0005-z. - DOI
    1. Andersson M. P.; Bligaard T.; Kustov A.; Larsen K. E.; Greeley J.; Johannessen T.; Christensen C. H.; Nørskov J. K. Toward Computational Screening in Heterogeneous Catalysis: Pareto-Optimal Methanation Catalysts. J. Catal. 2006, 239, 501–506. 10.1016/j.jcat.2006.02.016. - DOI
    1. Miranda-Galindo E. Y.; Segovia-Hernández J. G.; Hernández S.; Gutiérrez-Antonio C.; Briones-Ramírez A. Reactive Thermally Coupled Distillation Sequences: Pareto Front. Ind. Eng. Chem. Res. 2011, 50, 926–938. 10.1021/ie101290t. - DOI
    1. Schweidtmann A. M.; Clayton A. D.; Holmes N.; Bradford E.; Bourne R. A.; Lapkin A. A. Machine Learning Meets Continuous Flow Chemistry: Automated Optimization Towards the Pareto Front of Multiple Objectives. Chem. Eng. J. 2018, 352, 277–282. 10.1016/j.cej.2018.07.031. - DOI
    1. Bradford E.; Schweidtmann A. M.; Lapkin A. Efficient Multiobjective Optimization Employing Gaussian Processes, Spectral Sampling and a Genetic Algorithm. J. Global Optim. 2018, 71, 407–438. 10.1007/s10898-018-0609-2. - DOI