Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Aug 18:11:1925.
doi: 10.3389/fmicb.2020.01925. eCollection 2020.

Computational Methods for Strain-Level Microbial Detection in Colony and Metagenome Sequencing Data

Affiliations
Review

Computational Methods for Strain-Level Microbial Detection in Colony and Metagenome Sequencing Data

Christine Anyansi et al. Front Microbiol. .

Abstract

Metagenomic sequencing is a powerful tool for examining the diversity and complexity of microbial communities. Most widely used tools for taxonomic profiling of metagenomic sequence data allow for a species-level overview of the composition. However, individual strains within a species can differ greatly in key genotypic and phenotypic characteristics, such as drug resistance, virulence and growth rate. Therefore, the ability to resolve microbial communities down to the level of individual strains within a species is critical to interpreting metagenomic data for clinical and environmental applications, where identifying a particular strain, or tracking a particular strain across a set of samples, can help aid in clinical diagnosis and treatment, or in characterizing yet unstudied strains across novel environmental locations. Recently published approaches have begun to tackle the problem of resolving strains within a particular species in metagenomic samples. In this review, we present an overview of these new algorithms and their uses, including methods based on assembly reconstruction and methods operating with or without a reference database. While existing metagenomic analysis methods show reasonable performance at the species and higher taxonomic levels, identifying closely related strains within a species presents a bigger challenge, due to the diversity of databases, genetic relatedness, and goals when conducting these analyses. Selection of which metagenomic tool to employ for a specific application should be performed on a case-by case basis as these tools have strengths and weaknesses that affect their performance on specific tasks. A comprehensive benchmark across different use case scenarios is vital to validate performance of these tools on microbial samples. Because strain-level metagenomic analysis is still in its infancy, development of more fine-grained, high-resolution algorithms will continue to be in demand for the future.

Keywords: bioinformatics; metagenomics; methods review; microbial detection; strain-level classification; whole genome sequencing.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Assembly of multiple distinct strains from a read set. The blue areas in the sample reads represent regions where the strains have identical sequence. Variant locations in the reads are denoted as red or dark gray stripes. Red variants originate from one haplotype, whereas dark gray variants originate from the other. The goal of an assembly based method is to resolve distinct strains based on the coverage and distribution of the read data, drawing on methods previously developed for resolving haplotypes.
FIGURE 2
FIGURE 2
Alignment based approaches. Reads of a sequencing dataset – where different colors denote genetically distinct strains – are aligned to a reference database of full genomes or taxonomic markers (in this case genes). Strain abundances can be estimated by the relative number of reads aligning to each reference genome.
FIGURE 3
FIGURE 3
Tree Based Method Overview. (A) Example database of genomes with SNPs present as markers. (B) Representation of genome database, where 1 denotes a SNP and 0 absence of a SNP (C) SNP tree constructed based on SNPs from the database. (D) SNPs present in new reads can be matched against the tree to infer likely reference genome of origin by identifying sequences of successfully matching nodes (a path).
FIGURE 4
FIGURE 4
Flow chart of tool selection depending on scenario. Guide chart showing which tools can be used in which use case. Presence of a tool under one use case doesn’t necessarily exclude it from being applicable to another use case.

References

    1. Ahn T. H., Chai J., Pan C. (2015). Sigma: strain-level inference of genomes from metagenomic analysis for biosurveillance. Bioinformatics 31 170–177. 10.1093/bioinformatics/btu641 - DOI - PMC - PubMed
    1. Albanese D., Donati C. (2017). Strain profiling and epidemiology of bacterial species from metagenomic sequencing. Nat. Commun. 8:2260. 10.1038/s41467-017-02209-5 - DOI - PMC - PubMed
    1. Alizon S., de Roode J. C., Michalakis Y. (2013). Multiple infections and the evolution of virulence. Ecol. Lett. 16 556–567. 10.1111/ele.12076 - DOI - PubMed
    1. Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., et al. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25 3389–3402. 10.1093/nar/25.17.3389 - DOI - PMC - PubMed
    1. Angly F. E., Willner D., Rohwer F., Hugenholtz P., Tyson G. W. (2012). Grinder: a versatile amplicon and shotgun sequence simulator. Nucleic Acids Res. 40:e94. 10.1093/nar/gks251 - DOI - PMC - PubMed

LinkOut - more resources