Phoenics: A Bayesian Optimizer for Chemistry

Florian Häse¹, Loïc M Roch¹, Christoph Kreisbeck¹, Alán Aspuru-Guzik^{1

2

3

4}

Affiliations

¹ Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, United States.
² Department of Chemistry and Department of Computer Science, University of Toronto, Toronto, Ontario M5S 3H6, Canada.
³ Vector Institute for Artificial Intelligence, Toronto, Ontario M5S 1M1, Canada.
⁴ Canadian Institute for Advanced Research (CIFAR) Senior Fellow, Toronto, Ontario M5S 1M1, Canada.

PMID: 30276246
PMCID: PMC6161047
DOI: 10.1021/acscentsci.8b00307

Phoenics: A Bayesian Optimizer for Chemistry

Florian Häse et al. ACS Cent Sci. 2018.

. 2018 Sep 26;4(9):1134-1145.

doi: 10.1021/acscentsci.8b00307. Epub 2018 Aug 24.

Authors

Florian Häse¹, Loïc M Roch¹, Christoph Kreisbeck¹, Alán Aspuru-Guzik^{1

2

3

4}

Affiliations

¹ Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, United States.
² Department of Chemistry and Department of Computer Science, University of Toronto, Toronto, Ontario M5S 3H6, Canada.
³ Vector Institute for Artificial Intelligence, Toronto, Ontario M5S 1M1, Canada.
⁴ Canadian Institute for Advanced Research (CIFAR) Senior Fellow, Toronto, Ontario M5S 1M1, Canada.

PMID: 30276246
PMCID: PMC6161047
DOI: 10.1021/acscentsci.8b00307

Abstract

We report Phoenics, a probabilistic global optimization algorithm identifying the set of conditions of an experimental or computational procedure which satisfies desired targets. Phoenics combines ideas from Bayesian optimization with concepts from Bayesian kernel density estimation. As such, Phoenics allows to tackle typical optimization problems in chemistry for which objective evaluations are limited, due to either budgeted resources or time-consuming evaluations of the conditions, including experimentation or enduring computations. Phoenics proposes new conditions based on all previous observations, avoiding, thus, redundant evaluations to locate the optimal conditions. It enables an efficient parallel search based on intuitive sampling strategies implicitly biasing toward exploration or exploitation of the search space. Our benchmarks indicate that Phoenics is less sensitive to the response surface than already established optimization algorithms. We showcase the applicability of Phoenics on the Oregonator, a complex case-study describing a nonlinear chemical reaction network. Despite the large search space, Phoenics quickly identifies the conditions which yield the desired target dynamic behavior. Overall, we recommend Phoenics for rapid optimization of unknown expensive-to-evaluate objective functions, such as experimentation or long-lasting computations.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

**Figure 1**
Illustration of the workflow of Phoenics. (A) Unknown, possibly high-dimensional, objective function of an experimental procedure or computation. The objective function has been evaluated at eight different conditions (green), which comprise the set of observations in this illustration. (B) The observed conditions are processed by a Bayesian neural network yielding a probabilistic model for estimating parameter kernel densities. Note that the probabilistic approach allows for a higher flexibility of our surrogate model compared to standard kernel density estimation. (C) The surrogate model is constructed by weighting the estimated parameter kernel densities with their associated observed objective values. (D) The surrogate can be globally reshaped using a single sampling parameter λ to favor exploration (red) or exploitation (blue) of the parameter space.

**Figure 2**
Number of objective function evaluations required to reach objective function values lower than the average lowest achieved values of random searches with 10⁴ evaluations for Phoenics (λ ∈ {−1, 0, 1}), RFs, and GPs. Results are reported for the Ackley (A), Dejong (B), Schwefel (C), and dAckley (D) objective functions. Details on the benchmark functions are provided in the Supporting Information, Sec. S.1.

**Figure 3**
Average minimum objective function values for the Ackley function achieved in 20 independent runs of the three optimization algorithms studied in this work: our optimizer (Phoenics), spearmint (GP), and SMAC (RF). For each run a different number of proposed samples p was evaluated in parallel. Minimum achieved objective function values are reported with respect to the total number of objective function evaluations and the number of evaluated batches. The dashed blue lines denote the minimum achieved error after 10⁴ of random search for reference.

**Figure 4**
Progress of sample optimization runs of the three studied optimization algorithms on the two-dimensional Michalewicz function. Phoenics proposed a total of three samples per batch, which were then evaluated in parallel. Each sample was suggested based on a particular value of the exploration parameter λ ∈ {−1, 0, 1}. Left panels illustrate the parameter points proposed at each optimization iteration while right panels depict the achieved objective function values. Depicted points are more transparent at the beginning of the optimization and more opaque toward the end. Starting points for the optimization runs are drawn as black squares.

**Figure 5**
Average deviations taken over 20 independent runs between the lowest encountered objective function value and the global minimum achieved after 200 objective function evaluations for different parameter set dimensions. Results are reported for Ackley (A), Dejong (B), Schwefel (C), and dAckley (D). Uncertainty bands illustrate bootstrapped estimates of the deviation of the means with one and two standard deviations.

**Scheme 1. Subreactions of the Belousov–Zhabotinsky Reaction**

**Figure 6**
Average achieved losses for finding reaction parameters of the reduced Oregonator model achieved by the five optimization algorithms employed in this study. Correct periodicities of the concentration traces are achieved for losses lower than 500. Uncertainty bands illustrate bootstrapped deviations on the mean for one standard deviation.

**Figure 7**
Time traces of dimensionless concentrations of compounds in the Oregonator model. Target traces are depicted with solid, transparent lines while predicted traces are shown in dashed, opaque lines. Traces were simulated for a total of 12 dimensionless time units, but are only shown for the first six time units for clarity.

See this image and copyright information in PMC

References

1. Fisher R. A.The Design of Experiments; Oliver and Boyd: Edinburgh, London, 1937.
1. Box G. E. P.; Hunter J. S.; Hunter W. G.. Statistics for Experimenters: Design, Innovation and Discovery; Wiley-Interscience: New York, 2005; Vol. 2.
1. Anderson M. J.; Whitcomb P. J.. DOE Simplified: Pratical Tools for Effective Experimentation; CRC Press: New York, 2016.
1. Negoescu D. M.; Frazier P. I.; Powell W. B. The Knowledge-Gradient Algorithm for Sequencing Experiments in Drug Discovery. INFORMS J. Comput. 2011, 23, 346–363. 10.1287/ijoc.1100.0417. - DOI
1. Lopez S. A.; Sanchez-Lengeling B.; de Goes Soares J.; Aspuru-Guzik A. Design Principles and Top Non-Fullerence Acceptor Candidates for Organic Photovoltaics. Joule 2017, 1, 857–870. 10.1016/j.joule.2017.10.006. - DOI

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Phoenics: A Bayesian Optimizer for Chemistry

Affiliations

Phoenics: A Bayesian Optimizer for Chemistry

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources