Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 11;7(4):e14667.
doi: 10.2196/14667.

Developing a Reproducible Microbiome Data Analysis Pipeline Using the Amazon Web Services Cloud for a Cancer Research Group: Proof-of-Concept Study

Affiliations

Developing a Reproducible Microbiome Data Analysis Pipeline Using the Amazon Web Services Cloud for a Cancer Research Group: Proof-of-Concept Study

Jinbing Bai et al. JMIR Med Inform. .

Abstract

Background: Cloud computing for microbiome data sets can significantly increase working efficiencies and expedite the translation of research findings into clinical practice. The Amazon Web Services (AWS) cloud provides an invaluable option for microbiome data storage, computation, and analysis.

Objective: The goals of this study were to develop a microbiome data analysis pipeline by using AWS cloud and to conduct a proof-of-concept test for microbiome data storage, processing, and analysis.

Methods: A multidisciplinary team was formed to develop and test a reproducible microbiome data analysis pipeline with multiple AWS cloud services that could be used for storage, computation, and data analysis. The microbiome data analysis pipeline developed in AWS was tested by using two data sets: 19 vaginal microbiome samples and 50 gut microbiome samples.

Results: Using AWS features, we developed a microbiome data analysis pipeline that included Amazon Simple Storage Service for microbiome sequence storage, Linux Elastic Compute Cloud (EC2) instances (ie, servers) for data computation and analysis, and security keys to create and manage the use of encryption for the pipeline. Bioinformatics and statistical tools (ie, Quantitative Insights Into Microbial Ecology 2 and RStudio) were installed within the Linux EC2 instances to run microbiome statistical analysis. The microbiome data analysis pipeline was performed through command-line interfaces within the Linux operating system or in the Mac operating system. Using this new pipeline, we were able to successfully process and analyze 50 gut microbiome samples within 4 hours at a very low cost (a c4.4xlarge EC2 instance costs $0.80 per hour). Gut microbiome findings regarding diversity, taxonomy, and abundance analyses were easily shared within our research team.

Conclusions: Building a microbiome data analysis pipeline with AWS cloud is feasible. This pipeline is highly reliable, computationally powerful, and cost effective. Our AWS-based microbiome analysis pipeline provides an efficient tool to conduct microbiome data analysis.

Keywords: Amazon Web Services; cloud computation; microbiome; pipeline; sequence analysis.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
Design process of the microbiome analysis pipeline.
Figure 2
Figure 2
QIIME 2 workflow. QIIME: Quantitative Insights Into Microbial Ecology; OTU: operational taxonomic unit; PCoA: principal coordinates analysis; ANCOM: analysis of composition of microbiomes.
Figure 3
Figure 3
The microbiome data analysis pipeline using AWS. AWS: Amazon Web Services; S3: Simple Storage Service; VPC: virtual private cloud; QIIME: Quantitative Insights Into Microbial Ecology; EBS: Elastic Block Store; EC2: Elastic Compute Cloud.

References

    1. Prosperi M, Min JS, Bian J, Modave F. Big data hurdles in precision medicine and precision public health. BMC Med Inform Decis Mak. 2018 Dec 29;18(1):139. doi: 10.1186/s12911-018-0719-2. https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-0... - DOI - DOI - PMC - PubMed
    1. Agarwal V, Zhang L, Zhu J, Fang S, Cheng T, Hong C, Shah NH. Impact of Predicting Health Care Utilization Via Web Search Behavior: A Data-Driven Analysis. J Med Internet Res. 2016 Sep 21;18(9):e251. doi: 10.2196/jmir.6240. https://www.jmir.org/2016/9/e251/ - DOI - PMC - PubMed
    1. Ursell Luke K, Metcalf Jessica L, Parfrey Laura Wegener, Knight Rob. Defining the human microbiome. Nutr Rev. 2012 Aug;70 Suppl 1:S38–44. doi: 10.1111/j.1753-4887.2012.00493.x. http://europepmc.org/abstract/MED/22861806 - DOI - PMC - PubMed
    1. Knight R. Follow Your Gut: The Enormous Impact Of Tiny Microbes (TED Books) Simon & Schuster / Ted; 2019.
    1. Sender R, Fuchs S, Milo R. Are We Really Vastly Outnumbered? Revisiting the Ratio of Bacterial to Host Cells in Humans. Cell. 2016 Jan 28;164(3):337–40. doi: 10.1016/j.cell.2016.01.013. https://linkinghub.elsevier.com/retrieve/pii/S0092-8674(16)00053-2 - DOI - PubMed