ViraPipe: scalable parallel pipeline for viral metagenome analysis from next generation sequencing reads
- PMID: 29106455
- DOI: 10.1093/bioinformatics/btx702
ViraPipe: scalable parallel pipeline for viral metagenome analysis from next generation sequencing reads
Abstract
Motivation: Next Generation Sequencing (NGS) technology enables identification of microbial genomes from massive amount of human microbiomes more rapidly and cheaper than ever before. However, the traditional sequential genome analysis algorithms, tools, and platforms are inefficient for performing large-scale metagenomic studies on ever-growing sample data volumes. Currently, there is an urgent need for scalable analysis pipelines that enable harnessing all the power of parallel computation in computing clusters and in cloud computing environments. We propose ViraPipe, a scalable metagenome analysis pipeline that is able to analyze thousands of human microbiomes in parallel in tolerable time. The pipeline is tuned for analyzing viral metagenomes and the software is applicable for other metagenomic analyses as well. ViraPipe integrates parallel BWA-MEM read aligner, MegaHit De novo assembler, and BLAST and HMMER3 sequence search tools. We show the scalability of ViraPipe by running experiments on mining virus related genomes from NGS datasets in a distributed Spark computing cluster.
Results: ViraPipe analyses 768 human samples in 210 minutes on a Spark computing cluster comprising 23 nodes and 1288 cores in total. The speedup of ViraPipe executed on 23 nodes was 11x compared to the sequential analysis pipeline executed on a single node. The whole process includes parallel decompression, read interleaving, BWA-MEM read alignment, filtering and normalizing of non-human reads, De novo contigs assembling, and searching of sequences with BLAST and HMMER3 tools.
Contact: ilari.maarala@aalto.fi.
Availability and implementation: https://github.com/NGSeq/ViraPipe.
Similar articles
-
MetaSpark: a spark-based distributed processing tool to recruit metagenomic reads to reference genomes.Bioinformatics. 2017 Apr 1;33(7):1090-1092. doi: 10.1093/bioinformatics/btw750. Bioinformatics. 2017. PMID: 28065898
-
MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices.Methods. 2016 Jun 1;102:3-11. doi: 10.1016/j.ymeth.2016.02.020. Epub 2016 Mar 21. Methods. 2016. PMID: 27012178 Review.
-
InteMAP: Integrated metagenomic assembly pipeline for NGS short reads.BMC Bioinformatics. 2015 Aug 7;16:244. doi: 10.1186/s12859-015-0686-x. BMC Bioinformatics. 2015. PMID: 26250558 Free PMC article.
-
Analyzing large scale genomic data on the cloud with Sparkhit.Bioinformatics. 2018 May 1;34(9):1457-1465. doi: 10.1093/bioinformatics/btx808. Bioinformatics. 2018. PMID: 29253074 Free PMC article.
-
Assessment of metagenomic assemblers based on hybrid reads of real and simulated metagenomic sequences.Brief Bioinform. 2020 May 21;21(3):777-790. doi: 10.1093/bib/bbz025. Brief Bioinform. 2020. PMID: 30860572 Free PMC article. Review.
Cited by
-
Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services.J Am Med Inform Assoc. 2020 Sep 1;27(9):1425-1430. doi: 10.1093/jamia/ocaa068. J Am Med Inform Assoc. 2020. PMID: 32719837 Free PMC article.
-
Effects of swimming training on cecum microorganisms and metabolites in rats with high fat diet.Front Mol Biosci. 2025 Aug 8;12:1569239. doi: 10.3389/fmolb.2025.1569239. eCollection 2025. Front Mol Biosci. 2025. PMID: 40861425 Free PMC article.
-
DisCVR: Rapid viral diagnosis from high-throughput sequencing data.Virus Evol. 2019 Aug 26;5(2):vez033. doi: 10.1093/ve/vez033. eCollection 2019 Jul. Virus Evol. 2019. PMID: 31528358 Free PMC article.
-
Distributed hybrid-indexing of compressed pan-genomes for scalable and fast sequence alignment.PLoS One. 2021 Aug 3;16(8):e0255260. doi: 10.1371/journal.pone.0255260. eCollection 2021. PLoS One. 2021. PMID: 34343181 Free PMC article.
-
Benchmarking different approaches for Norovirus genome assembly in metagenome samples.BMC Genomics. 2021 Nov 24;22(1):849. doi: 10.1186/s12864-021-08067-2. BMC Genomics. 2021. PMID: 34819031 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials