. 2021 Jul 12;37(Suppl_1):i451-i459.

doi: 10.1093/bioinformatics/btab291.

Asynchronous parallel Bayesian optimization for AI-driven cloud laboratories

Trevor S Frisby¹, Zhiyun Gong¹, Christopher James Langmead¹

Affiliations

PMID: 34252975
PMCID: PMC8275326
DOI: 10.1093/bioinformatics/btab291

Asynchronous parallel Bayesian optimization for AI-driven cloud laboratories

Trevor S Frisby et al. Bioinformatics. 2021.

. 2021 Jul 12;37(Suppl_1):i451-i459.

doi: 10.1093/bioinformatics/btab291.

Authors

Trevor S Frisby¹, Zhiyun Gong¹, Christopher James Langmead¹

Affiliation

¹ Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

PMID: 34252975
PMCID: PMC8275326
DOI: 10.1093/bioinformatics/btab291

Abstract

Motivation: The recent emergence of cloud laboratories-collections of automated wet-lab instruments that are accessed remotely, presents new opportunities to apply Artificial Intelligence and Machine Learning in scientific research. Among these is the challenge of automating the process of optimizing experimental protocols to maximize data quality.

Results: We introduce a new deterministic algorithm, called PaRallel OptimizaTiOn for ClOud Laboratories (PROTOCOL), that improves experimental protocols via asynchronous, parallel Bayesian optimization. The algorithm achieves exponential convergence with respect to simple regret. We demonstrate PROTOCOL in both simulated and real-world cloud labs. In the simulated lab, it outperforms alternative approaches to Bayesian optimization in terms of its ability to find optimal configurations, and the number of experiments required to find the optimum. In the real-world lab, the algorithm makes progress toward the optimal setting.

Data availability and implementation: PROTOCOL is available as both a stand-alone Python library, and as part of a R Shiny application at https://github.com/clangmead/PROTOCOL. Data are available at the same repository.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
Top row. Shown are hierarchical trees produced by PROTOCOL at three different time points while optimizing a 1D sinusoidal function (see text for explanation). The nodes are fixed along the horizontal axis according the center coordinate of the interval they represent. The function optimizer, $x \approx 0.868$ , is indicated by the star along the horizontal axis. Bottom row. A visualization of the frontier calculated by PROTOCOL in relation to the hierarchical tree. The enumerated red nodes on the left indicate intervals whose center coordinate are used to calculate the frontier. The central diagram shows the frontier, where intervals 1, 2 and 4 lie on the frontier but intervals 3 and 5 do not. Note that the depth of the tree is inversely proportional to the size of the interval. The red nodes on the right denote those intervals that lie on the frontier, and are those whose center coordinates will be requested for evaluation

**Fig. 2.**
Top row. The ground truth peak height of observed MALDI-ToF experimental configurations is shown as a function of the number of total evaluations. The error bars in the non-PROTOCOL curves denote a mean ± 1 SEM calculated over 100 trials initialized with different randomly chosen training sets of size 4 (which is equal to the allowed level of parallelization). Bottom row. Again with the peak height endpoint, these show the number of evaluations each algorithm requested before identifying the optimal configuration. For the non-PROTOCOL algorithms, only the subset of the 100 trials that actually identified the optimal configuration are used. Error bars denote ± 1 SEM over this subset of trials

**Fig. 3.**
CT-polymer conjugate ground-truth peak heights for MALDI-ToF parameterizations selected by PROTOCOL and the GP-UCB algorithm. Two cases are shown for the GP-UCB algorithm—one where the algorithm identified the configuration that led to the maximum peak height, and one that did not. For each, the initial evaluation points are indicated by an ‘x’. Whereas the initial point evaluated by PROTOCOL is a consequence of the algorithm (the central point of the input space), GP-UCB depends on an initial training set. The ability of GP-UCB to identify the optimal configuration is influenced by this initial set

**Fig. 4.**
Chromatograms corresponding to the first experimental configuration chosen by PROTOCOL (left) as well as the experimental configuration that yielded the greatest resolution chosen by PROTOCOL (middle) and LHS (right)

**Fig. 5.**
Left. The Shiny app start page, where the user can initialize an optimization problem by defining the parameters to optimize over. Right. The Shiny app data upload page, where the user can upload previously evaluated data and select the optimization algorithm to use

See this image and copyright information in PMC

References

1. Agrawal S., Goyal N. (2012) Analysis of thompson sampling for the multi-armed bandit problem. In Proc. 25th Annual Conference on Learning Theory, Volume 23 of Proceedings of Machine Learning Research, pp. 39.1–39.26, Edinburgh, Scotland, pp. 25–27.
1. Bergstra J.S. et al. (2011) Algorithms for hyper-parameter optimization. In: Shawe-Taylor J. et al. (eds.) Advances in Neural Information Processing Systems. Vol. 24. Curran Associates, Inc., Granada, Spain, pp. 2546–2554.
1. Bubeck S., Cesa-Bianchi N. (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn., 5, 1–122.
1. Chang W. et al. (2020) Shiny: Web Application Framework for R. R package version 1.5.0 https://cran.r-project.org/web/packages/shiny/index.htm.
1. Cummings C.S. et al. (2017) Design of stomach acid-stable and mucin-binding enzyme polymer conjugates. Biomacromolecules, 18, 576–586. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Asynchronous parallel Bayesian optimization for AI-driven cloud laboratories

Affiliation

Asynchronous parallel Bayesian optimization for AI-driven cloud laboratories

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous