Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan 12:6:e4227.
doi: 10.7717/peerj.4227. eCollection 2018.

FastViromeExplorer: a pipeline for virus and phage identification and abundance profiling in metagenomics data

Affiliations

FastViromeExplorer: a pipeline for virus and phage identification and abundance profiling in metagenomics data

Saima Sultana Tithi et al. PeerJ. .

Abstract

With the increase in the availability of metagenomic data generated by next generation sequencing, there is an urgent need for fast and accurate tools for identifying viruses in host-associated and environmental samples. In this paper, we developed a stand-alone pipeline called FastViromeExplorer for the detection and abundance quantification of viruses and phages in large metagenomic datasets by performing rapid searches of virus and phage sequence databases. Both simulated and real data from human microbiome and ocean environmental samples are used to validate FastViromeExplorer as a reliable tool to quickly and accurately identify viruses and their abundances in large datasets.

Keywords: Abundance quantification; Metagenomics; Phage; Reference-based virus detection; Viral metagenomics; Virus.

PubMed Disclaimer

Conflict of interest statement

The authors declare there are no competing interests.

Figures

Figure 1
Figure 1. Kallisto’s indexing time for five reference databases, NCBI RefSeq eukaryotic viruses (99 MB), NCBI RefSeq phages (148 MB), all NCBI RefSeq viruses and phages (247 MB), 62,921 mVCs (992 MB), and 125,842 mVCs (2 GB).
Figure 2
Figure 2. Comparison of running time among FastViromeExplorer, ViromeScan, and Blastn for seven data sets with 1, 3, 5, 10, 20, 30, and 40 million reads, respectively (A) against a reference database containing 8,957 NCBI RefSeq viruses, (B) against a reference database containing 125,842 mVCs.
Figure 3
Figure 3. F1 score of FastViromeExplorer, ViromeScan, and Blastn when using NCBI eukaryotic viruses as the reference database and four simulated data sets of 1 million reads each with mutation frequency 3%, 5%, 7%, and 10% respectively.
Figure 4
Figure 4. Number of viruses from ViromeScan result before applying any filter, after applying criterion 1, after applying criteria 1 and 2, and after applying all three criteria.
Figure 5
Figure 5. Relative abundance of host bacteria at Order level in the FMT samples from FastViromeExplorer result using the 125,842 mVCs as reference, where abundance is normalized by the total abundance of viruses in the sample.

References

    1. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research. 1997;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. - DOI - PMC - PubMed
    1. Aylward FO, Boeuf D, Mende DR, Wood-Charlson EM, Vislova A, Eppley JM, Romano AE, DeLong EF. Diel cycling and long-term persistence of viruses in the ocean’s euphotic zone. Proceedings of the National Academy of Sciences of the United States of America. 2017;114(3):11446–11451. - PMC - PubMed
    1. Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nature Biotechnology. 2016;34(5):525–527. doi: 10.1038/nbt.3519. - DOI - PubMed
    1. Campos GS, Bandeira AC, Sardi SI. Zika virus outbreak, Bahia, Brazil. Emerging Infectious Diseases. 2015;21(10):1885–1886. doi: 10.3201/eid2110.150847. - DOI - PMC - PubMed
    1. Carroll MW, Matthews DA, Hiscox JA, Elmore MJ, Pollakis G, Rambaut A, Hewson R, García-Dorival I, Bore JA, Koundouno R, Abdellati S, Afrough B, Aiyepada J, Akhilomen P, Asogun D, Atkinson B, Badusche M, Bah A, Bate S, Baumann J, Becker D, Becker-Ziaja B, Bocquin A, Borremans B, Bosworth A, Boettcher JP, Cannas A, Carletti F, Castilletti C, Clark S, Colavita F, Diederich S, Donatus A, Duraffour S, Ehichioya D, Ellerbrok H, Fernandez-Garcia MD, Fizet A, Fleischmann E, Gryseels S, Hermelink A, Hinzmann J, Hopf-Guevara U, Ighodalo Y, Jameson L, Kelterbaum A, Kis Z, Kloth S, Kohl C, Korva M, Kraus A, Kuisma E, Kurth A, Liedigk B, Logue CH, Lüdtke A, Maes P, McCowen J, Mély S, Mertens M, Meschi S, Meyer B, Michel J, Molkenthin P, Muñoz-Fontela C, Muth D, Newman ENC, Ngabo D, Oestereich L, Okosun J, Olokor T, Omiunu R, Omomoh E, Pallasch E, Pályi B, Portmann J, Pottage T, Pratt C, Priesnitz S, Quartu S, Rappe J, Repits J, Richter M, Rudolf M, Sachse A, Schmidt KM, Schudt G, Strecker T, Thom R, Thomas S, Tobin E, Tolley H, Trautner J, Vermoesen T, Vitoriano I, Wagner M, Wolff S, Yue C, Capobianchi MR, Kretschmer B, Hall Y, Kenny JG, Rickett NY, Dudas G, Coltart CEM, Kerber R, Steer D, Wright C, Senyah F, Keita S, Drury P, Diallo B, Clerck Hd, Herp MV, Sprecher A, Traore A, Diakite M, Konde MK, Koivogui L, Magassouba N, Avšič-Županc T, Nitsche A, Strasser M, Ippolito G, Becker S, Stoecker K, Gabriel M, Raoul H, Caro AD, Wölfel R, Formenty P, Günther S. Temporal and spatial analysis of the 2014–2015 Ebola virus outbreak in West Africa. Nature. 2015;524(7563):97–101. doi: 10.1038/nature14594. - DOI - PMC - PubMed

LinkOut - more resources