Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2018 Apr 23:9:749.
doi: 10.3389/fmicb.2018.00749. eCollection 2018.

Overview of Virus Metagenomic Classification Methods and Their Biological Applications

Affiliations
Review

Overview of Virus Metagenomic Classification Methods and Their Biological Applications

Sam Nooij et al. Front Microbiol. .

Abstract

Metagenomics poses opportunities for clinical and public health virology applications by offering a way to assess complete taxonomic composition of a clinical sample in an unbiased way. However, the techniques required are complicated and analysis standards have yet to develop. This, together with the wealth of different tools and workflows that have been proposed, poses a barrier for new users. We evaluated 49 published computational classification workflows for virus metagenomics in a literature review. To this end, we described the methods of existing workflows by breaking them up into five general steps and assessed their ease-of-use and validation experiments. Performance scores of previous benchmarks were summarized and correlations between methods and performance were investigated. We indicate the potential suitability of the different workflows for (1) time-constrained diagnostics, (2) surveillance and outbreak source tracing, (3) detection of remote homologies (discovery), and (4) biodiversity studies. We provide two decision trees for virologists to help select a workflow for medical or biodiversity studies, as well as directions for future developments in clinical viral metagenomics.

Keywords: decision tree; pipeline; software; standardization; use case; viral metagenomics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Generic pipeline scheme and breakdown of tools. (A) The process of classifying raw sequencing reads in 5 generic steps. (B) The steps that workflows use (in gray). UPfMCS: “Unknown Pathogens from Mixed Clinical Samples”; MEGAN CE: MEGAN Community Edition.
Figure 2
Figure 2
Different benchmark scores of virus classification workflows. Twenty-seven different workflows (Left) have been subjected to benchmarks, by the developers (Top) or by independent groups (Bottom), measuring sensitivity (Left column), specificity (Middle column) and precision (Right column) in different numbers of tests. Numbers between brackets (n = a, b, c) indicate number of sensitivity, specificity, and precision tests, respectively.
Figure 3
Figure 3
Correlations between performance scores and analysis steps. Sensitivity, specificity and precision scores (in columns) for workflows that incorporated different analysis steps (in rows). Numbers at the bottom indicate number of benchmarks performed.
Figure 4
Figure 4
Correlation between performance and search algorithm and runtime. Sensitivity, specificity and precision scores (in columns) for workflows that incorporated different search algorithms, using either nucleotide sequences, amino acid sequences or both, and workflows with different runtimes (rows). Numbers at the bottom indicate number of benchmarks performed.
Figure 5
Figure 5
Decision tree for selecting a virus metagenomics classification workflow for medical applications. Workflows are suitable for medical purposes when they can detect pathogenic viruses by classifying sequences to a genus level or further (e.g., species, genotype), or when they detect integration sites. Forty workflows matched these criteria. Workflows can be applied to surveillance or outbreak tracing studies when very specific classification are made, i.e., genotypes, strains or lineages. A 1-day analysis corresponds to being able to analyse a sample within 5 h. Detection of novel variants is made possible by sensitive search methods, amino acid alignment or composition search, and a broad reference database of potential hits. Numbers indicate the number of workflows available on the corresponding branch of the tree.
Figure 6
Figure 6
Decision tree for selecting a virus metagenomics classification workflow for biodiversity studies. Workflows for the characterisation of biodiversity of viruses have to classify a range of different viruses, i.e., have multiple reference taxa in the database. Forty-three workflows fitted this requirement. Novel variants can potentially be detected by using more sensitive search methods, amino acid alignment and composition search, and using diverse reference sequences. Finally, workflows are grouped by the taxonomic groups they can classify. Numbers indicate the number of workflows available on the corresponding branch of the tree.

References

    1. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. (1990). Basic local alignment search tool. J. Mol. Biol. 215, 403–410. 10.1016/S0022-2836(05)80360-2 - DOI - PubMed
    1. Alves J. M., de Oliveira A. L., Sandberg T. O., Moreno-Gallego J. L., de Toledo M. A., de Moura E. M., et al. . (2016). GenSeed-HMM: a tool for progressive assembly using profile HMMs as seeds and its application in alpavirinae viral discovery from metagenomic data. Front. Microbiol. 7:269. 10.3389/fmicb.2016.00269 - DOI - PMC - PubMed
    1. Ames S. K., Hysom D. A., Gardner S. N., Lloyd G. S., Gokhale M. B., Allen J. E. (2013). Scalable metagenomic taxonomy classification using a reference genome database. Bioinformatics 29, 2253–2260. 10.1093/bioinformatics/btt389 - DOI - PMC - PubMed
    1. Bankevich A., Nurk S., Antipov D., Gurevich A. A., Dvorkin M., Kulikov A. S., et al. . (2012). SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477. 10.1089/cmb.2012.0021 - DOI - PMC - PubMed
    1. Bazinet A. L., Cummings M. P. (2012). A comparative evaluation of sequence classification programs. BMC Bioinformatics 13:92. 10.1186/1471-2105-13-92 - DOI - PMC - PubMed

LinkOut - more resources