Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Oct;23(10):1721-9.
doi: 10.1101/gr.150151.112. Epub 2013 Jul 10.

Pathoscope: species identification and strain attribution with unassembled sequencing data

Affiliations

Pathoscope: species identification and strain attribution with unassembled sequencing data

Owen E Francis et al. Genome Res. 2013 Oct.

Abstract

Emerging next-generation sequencing technologies have revolutionized the collection of genomic data for applications in bioforensics, biosurveillance, and for use in clinical settings. However, to make the most of these new data, new methodology needs to be developed that can accommodate large volumes of genetic data in a computationally efficient manner. We present a statistical framework to analyze raw next-generation sequence reads from purified or mixed environmental or targeted infected tissue samples for rapid species identification and strain attribution against a robust database of known biological agents. Our method, Pathoscope, capitalizes on a Bayesian statistical framework that accommodates information on sequence quality, mapping quality, and provides posterior probabilities of matches to a known database of target genomes. Importantly, our approach also incorporates the possibility that multiple species can be present in the sample and considers cases when the sample species/strain is not in the reference database. Furthermore, our approach can accurately discriminate between very closely related strains of the same species with very little coverage of the genome and without the need for multiple alignment steps, extensive homology searches, or genome assembly--which are time-consuming and labor-intensive steps. We demonstrate the utility of our approach on genomic data from purified and in silico "environmental" samples from known bacterial agents impacting human health for accuracy assessment and comparison with other approaches.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Impact of the closely related strains on the read alignment proportions. The genomes in the database were aligned to each other using an all-against-all BLASTN approach (Agren et al. 2012), and strains of the same species that were >98% similar using this metric were considered “closely related” strains. As the number of closely related strains increases, the naïve algorithm was not able to definitively identify the origin species. However, Pathoscope performed consistently well independent of the number of closely related strains.

References

    1. Agren J, Sundstrom A, Hafstrom T, Segerman B 2012. Gegenees: Fragmented alignment of multiple genomes for determining phylogenomic distances and genetic signatures unique for specified target groups. PLoS ONE 7: e39107. - PMC - PubMed
    1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402 - PMC - PubMed
    1. Bhaduri A, Qu K, Lee CS, Ungewickell A, Khavari PA 2012. Rapid identification of non-human sequences in high-throughput sequencing datasets. Bioinformatics 28: 1174–1175 - PMC - PubMed
    1. Brady A, Salzberg SL 2009. Phymm and PhymmBL: Metagenomic phylogenetic classification with interpolated Markov models. Nat Methods 6: 673–676 - PMC - PubMed
    1. Clement NL, Snell Q, Clement MJ, Hollenhorst PC, Purwar J, Graves BJ, Cairns BR, Johnson WE 2010. The GNUMAP algorithm: Unbiased probabilistic mapping of oligonucleotides from next-generation sequencing. Bioinformatics 26: 38–45 - PMC - PubMed

Publication types

MeSH terms