Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jan 15;31(2):170-7.
doi: 10.1093/bioinformatics/btu641. Epub 2014 Sep 29.

Sigma: strain-level inference of genomes from metagenomic analysis for biosurveillance

Affiliations

Sigma: strain-level inference of genomes from metagenomic analysis for biosurveillance

Tae-Hyuk Ahn et al. Bioinformatics. .

Abstract

Motivation: Metagenomic sequencing of clinical samples provides a promising technique for direct pathogen detection and characterization in biosurveillance. Taxonomic analysis at the strain level can be used to resolve serotypes of a pathogen in biosurveillance. Sigma was developed for strain-level identification and quantification of pathogens using their reference genomes based on metagenomic analysis.

Results: Sigma provides not only accurate strain-level inferences, but also three unique capabilities: (i) Sigma quantifies the statistical uncertainty of its inferences, which includes hypothesis testing of identified genomes and confidence interval estimation of their relative abundances; (ii) Sigma enables strain variant calling by assigning metagenomic reads to their most likely reference genomes; and (iii) Sigma supports parallel computing for fast analysis of large datasets. The algorithm performance was evaluated using simulated mock communities and fecal samples with spike-in pathogen strains.

Availability and implementation: Sigma was implemented in C++ with source codes and binaries freely available at http://sigma.omicsbio.org.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Conceptual overview of the Sigma algorithm. The inputs are metagenomic reads and user-defined reference genomes (top panel). The alignment of reads to genomes is used to define a probabilistic model of metagenomic sequencing (middle panel). Genomes are detected with hypothesis testing, quantified with confidence interval estimation, and scanned for sequence variations (bottom panel).
Fig. 2.
Fig. 2.
Identification of a Salmonella enterica strain at a serial dilution of relative abundances in a human fecal microbiota background. (a) Likelihood ratios of all aligned Salmonella enterica strains. Only the correct strain (highlighted in red outline) has statistically significant identification with <0.01 p-value down to the 0.001% dataset. (b) Estimated and expected relative abundances (RA) of the spike-in Salmonella enterica strain. Point estimates (red dots) were bracketed by 95% confidence intervals (blue error bars) with small relative standard deviations (RSD) down to 0.001% (0.027X coverage depth).

References

    1. Ahmed SA, et al. Genomic comparison of Escherichia coli O104:H4 isolates from 2009 and 2011 reveals plasmid, and prophage heterogeneity, including shiga toxin encoding phage stx2. PLoS One. 2012;7:e48228. - PMC - PubMed
    1. Brady A, Salzberg S. PhymmBL expanded: confidence scores, custom databases, parallelization and more. Nat. Methods. 2011;8:367. - PMC - PubMed
    1. Diaz NN, et al. TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics. 2009;10:56. - PMC - PubMed
    1. Fox JL. Biosurveillance plan unveiled. Nat. Biotechnol. 2012;30:1014.
    1. Francis OE, et al. Pathoscope: species identification and strain attribution with unassembled sequencing data. Genome Res. 2013;23:1721–1729. - PMC - PubMed

Publication types