Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 12;37(Suppl_1):i451-i459.
doi: 10.1093/bioinformatics/btab291.

Asynchronous parallel Bayesian optimization for AI-driven cloud laboratories

Affiliations

Asynchronous parallel Bayesian optimization for AI-driven cloud laboratories

Trevor S Frisby et al. Bioinformatics. .

Abstract

Motivation: The recent emergence of cloud laboratories-collections of automated wet-lab instruments that are accessed remotely, presents new opportunities to apply Artificial Intelligence and Machine Learning in scientific research. Among these is the challenge of automating the process of optimizing experimental protocols to maximize data quality.

Results: We introduce a new deterministic algorithm, called PaRallel OptimizaTiOn for ClOud Laboratories (PROTOCOL), that improves experimental protocols via asynchronous, parallel Bayesian optimization. The algorithm achieves exponential convergence with respect to simple regret. We demonstrate PROTOCOL in both simulated and real-world cloud labs. In the simulated lab, it outperforms alternative approaches to Bayesian optimization in terms of its ability to find optimal configurations, and the number of experiments required to find the optimum. In the real-world lab, the algorithm makes progress toward the optimal setting.

Data availability and implementation: PROTOCOL is available as both a stand-alone Python library, and as part of a R Shiny application at https://github.com/clangmead/PROTOCOL. Data are available at the same repository.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Top row. Shown are hierarchical trees produced by PROTOCOL at three different time points while optimizing a 1D sinusoidal function (see text for explanation). The nodes are fixed along the horizontal axis according the center coordinate of the interval they represent. The function optimizer, x0.868, is indicated by the star along the horizontal axis. Bottom row. A visualization of the frontier calculated by PROTOCOL in relation to the hierarchical tree. The enumerated red nodes on the left indicate intervals whose center coordinate are used to calculate the frontier. The central diagram shows the frontier, where intervals 1, 2 and 4 lie on the frontier but intervals 3 and 5 do not. Note that the depth of the tree is inversely proportional to the size of the interval. The red nodes on the right denote those intervals that lie on the frontier, and are those whose center coordinates will be requested for evaluation
Fig. 2.
Fig. 2.
Top row. The ground truth peak height of observed MALDI-ToF experimental configurations is shown as a function of the number of total evaluations. The error bars in the non-PROTOCOL curves denote a mean ± 1 SEM calculated over 100 trials initialized with different randomly chosen training sets of size 4 (which is equal to the allowed level of parallelization). Bottom row. Again with the peak height endpoint, these show the number of evaluations each algorithm requested before identifying the optimal configuration. For the non-PROTOCOL algorithms, only the subset of the 100 trials that actually identified the optimal configuration are used. Error bars denote ± 1 SEM over this subset of trials
Fig. 3.
Fig. 3.
CT-polymer conjugate ground-truth peak heights for MALDI-ToF parameterizations selected by PROTOCOL and the GP-UCB algorithm. Two cases are shown for the GP-UCB algorithm—one where the algorithm identified the configuration that led to the maximum peak height, and one that did not. For each, the initial evaluation points are indicated by an ‘x’. Whereas the initial point evaluated by PROTOCOL is a consequence of the algorithm (the central point of the input space), GP-UCB depends on an initial training set. The ability of GP-UCB to identify the optimal configuration is influenced by this initial set
Fig. 4.
Fig. 4.
Chromatograms corresponding to the first experimental configuration chosen by PROTOCOL (left) as well as the experimental configuration that yielded the greatest resolution chosen by PROTOCOL (middle) and LHS (right)
Fig. 5.
Fig. 5.
Left. The Shiny app start page, where the user can initialize an optimization problem by defining the parameters to optimize over. Right. The Shiny app data upload page, where the user can upload previously evaluated data and select the optimization algorithm to use

References

    1. Agrawal S., Goyal N. (2012) Analysis of thompson sampling for the multi-armed bandit problem. In Proc. 25th Annual Conference on Learning Theory, Volume 23 of Proceedings of Machine Learning Research, pp. 39.1–39.26, Edinburgh, Scotland, pp. 25–27.
    1. Bergstra J.S. et al. (2011) Algorithms for hyper-parameter optimization. In: Shawe-Taylor J. et al. (eds.) Advances in Neural Information Processing Systems. Vol. 24. Curran Associates, Inc., Granada, Spain, pp. 2546–2554.
    1. Bubeck S., Cesa-Bianchi N. (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn., 5, 1–122.
    1. Chang W. et al. (2020) Shiny: Web Application Framework for R. R package version 1.5.0 https://cran.r-project.org/web/packages/shiny/index.htm.
    1. Cummings C.S. et al. (2017) Design of stomach acid-stable and mucin-binding enzyme polymer conjugates. Biomacromolecules, 18, 576–586. - PubMed

Publication types