Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar;38(3):288-292.
doi: 10.1038/s41587-019-0360-3. Epub 2020 Feb 5.

Butler enables rapid cloud-based analysis of thousands of human genomes

Collaborators, Affiliations

Butler enables rapid cloud-based analysis of thousands of human genomes

Sergei Yakneen et al. Nat Biotechnol. 2020 Mar.

Erratum in

Abstract

We present Butler, a computational tool that facilitates large-scale genomic analyses on public and academic clouds. Butler includes innovative anomaly detection and self-healing functions that improve the efficiency of data processing and analysis by 43% compared with current approaches. Butler enabled processing of a 725-terabyte cancer genome dataset from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project in a time-efficient and uniform manner.

PubMed Disclaimer

Conflict of interest statement

G.G. receives research funds from IBM and Pharmacyclics and is an inventor on patent applications related to MuTect, ABSOLUTE, MutSig, MSMuTect, MSMutSig and POLYSOLVER.

Figures

Fig. 1
Fig. 1. Butler framework architecture.
a, The framework consists of several interconnected components, each running on a separate virtual machine (VM). See Methods and Supplementary Note 1 for details. b, Metrics flow from all VMs into a time series database. The self-healing agent detects anomalies and takes appropriate action. See Supplementary Note 1 for details. Solid arrows indicate information flow; dashed arrows indicate metrics flow; dashed-and-dotted arrows indicate configuration instructions.
Fig. 2
Fig. 2. Butler performance comparison.
a,b, Comparing the ratio of actual to target progress rates for core PCAWG pipelines (a) vs. Butler pipelines (b). See Methods for details. c, Mean actual/target progress rate ratio across pipelines for core PCAWG (mean 0.49) vs. Butler (mean 0.7) pipelines, each of which were run once over the entirety of PCAWG samples available to us. d,e, Progress rate uniformity of core PCAWG pipelines (d) vs. Butler (e). See Methods for details. In all panels the samples are arranged by their completion date. Runtime includes time spent on failed attempts. Comparison between Butler and core pipelines was facilitated in the context of the PCAWG. Similar comparison between Butler and other frameworks is presently impractical at this scale due to the high costs and complexity involved.

References

    1. Habermann N, Mardin BR, Yakneen S, Korbel JO. Using large-scale genome variation cohorts to decipher the molecular mechanism of cancer. C. R. Biol. 2016;339:308–313. doi: 10.1016/j.crvi.2016.05.008. - DOI - PubMed
    1. Di Tommaso P, et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 2017;35:316–319. doi: 10.1038/nbt.3820. - DOI - PubMed
    1. Vivian J, Paten B. Toil enables reproducible, open source, big biomedical data analyses. Nat. Biotechnol. 2017;35:314–316. doi: 10.1038/nbt.3772. - DOI - PMC - PubMed
    1. Mashl RJ, et al. GenomeVIP: a cloud platform for genomic variant discovery and interpretation. Genome Res. 2017;27:1450–1459. doi: 10.1101/gr.211656.116. - DOI - PMC - PubMed
    1. Stein LD, Knoppers BM, Campbell P, Getz G, Korbel JO. Data analysis: create a cloud commons. Nature. 2015;523:149–151. doi: 10.1038/523149a. - DOI - PubMed

Publication types