Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan 29:15:30.
doi: 10.1186/1471-2105-15-30.

Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline

Affiliations

Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline

Jeffrey G Reid et al. BMC Bioinformatics. .

Abstract

Background: Massively parallel DNA sequencing generates staggering amounts of data. Decreasing cost, increasing throughput, and improved annotation have expanded the diversity of genomics applications in research and clinical practice. This expanding scale creates analytical challenges: accommodating peak compute demand, coordinating secure access for multiple analysts, and sharing validated tools and results.

Results: To address these challenges, we have developed the Mercury analysis pipeline and deployed it in local hardware and the Amazon Web Services cloud via the DNAnexus platform. Mercury is an automated, flexible, and extensible analysis workflow that provides accurate and reproducible genomic results at scales ranging from individuals to large cohorts.

Conclusions: By taking advantage of cloud computing and with Mercury implemented on the DNAnexus platform, we have demonstrated a powerful combination of a robust and fully validated software pipeline and a scalable computational resource that, to date, we have applied to more than 10,000 whole genome and whole exome samples.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Mercury Data Flow. 1) Sequencing Instrument raw data is passed to vendor primary analysis software to generate sequence reads and base call confidence values (qualities). 2) Reads and qualities are passed to a mapping tool (BWA) for comparison to a reference genome to determine the placement of reads on the reference (producing a BAM file). 3) Individual sequence event BAMs are merged to make a single sample-level BAM file that then is processed in preparation for variant calling. 4) Atlas-SNP and Atlas-indel are used to identify variants and produce variant files (VCF). 5) Annotation adds biological and functional information to the variant lists and formats them for delivery.
Figure 2
Figure 2
Workflow monitoring in DNAnexus. The GUI for applet monitoring displays the progress as a Gantt chart. The left panel lists the various steps including the parallelization steps with each row corresponding to a compute instance. A particular step can be clicked to determine the exact inputs and output or logs of execution for that step. Here we show a snapshot of the webpage displaying the progress of execution for the NA12878 exome analysis.

References

    1. Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010;11(1):31–46. doi: 10.1038/nrg2626. - DOI - PubMed
    1. Bainbridge MN. et al.Whole-genome sequencing for optimized patient management. Sci Transl Med. 2011;3(87):87re3. - PMC - PubMed
    1. Cancer Genome Atlas Research, N. Integrated genomic analyses of ovarian carcinoma. Nature. 2011;474(7353):609–615. doi: 10.1038/nature10166. - DOI - PMC - PubMed
    1. Wheeler DA. et al.The complete genome of an individual by massively parallel DNA sequencing. Nature. 2008;452(7189):872–6. doi: 10.1038/nature06884. - DOI - PubMed
    1. Challis D. et al.An integrative variant analysis suite for whole exome next-generation sequencing data. BMC Bioinforma. 2012;13:8. doi: 10.1186/1471-2105-13-8. - DOI - PMC - PubMed

Publication types