Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Dec 29:16:416.
doi: 10.1186/s12859-015-0840-5.

Pathosphere.org: pathogen detection and characterization through a web-based, open source informatics platform

Affiliations

Pathosphere.org: pathogen detection and characterization through a web-based, open source informatics platform

Andy Kilianski et al. BMC Bioinformatics. .

Abstract

Background: The detection of pathogens in complex sample backgrounds has been revolutionized by wide access to next-generation sequencing (NGS) platforms. However, analytical methods to support NGS platforms are not as uniformly available. Pathosphere (found at Pathosphere.org) is a cloud - based open - sourced community tool that allows for communication, collaboration and sharing of NGS analytical tools and data amongst scientists working in academia, industry and government. The architecture allows for users to upload data and run available bioinformatics pipelines without the need for onsite processing hardware or technical support.

Results: The pathogen detection capabilities hosted on Pathosphere were tested by analyzing pathogen-containing samples sequenced by NGS with both spiked human samples as well as human and zoonotic host backgrounds. Pathosphere analytical pipelines developed by Edgewood Chemical Biological Center (ECBC) identified spiked pathogens within a common sample analyzed by 454, Ion Torrent, and Illumina sequencing platforms. ECBC pipelines also correctly identified pathogens in human samples containing arenavirus in addition to animal samples containing flavivirus and coronavirus. These analytical methods were limited in the detection of sequences with limited homology to previous annotations within NCBI databases, such as parvovirus. Utilizing the pipeline-hosting adaptability of Pathosphere, the analytical suite was supplemented by analytical pipelines designed by the United States Army Medical Research Insititute of Infectious Diseases and Walter Reed Army Institute of Research (USAMRIID-WRAIR). These pipelines were implemented and detected parvovirus sequence in the sample that the ECBC iterative analysis previously failed to identify.

Conclusions: By accurately detecting pathogens in a variety of samples, this work demonstrates the utility of Pathosphere and provides a platform for utilizing, modifying and creating pipelines for a variety of NGS technologies developed to detect pathogens in complex sample backgrounds. These results serve as an exhibition for the existing pipelines and web-based interface of Pathosphere as well as the plug-in adaptability that allows for integration of newer NGS analytical software as it becomes available.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Pathosphere user interface. The web-based portion of Pathosphere contains message boards, forums, user communities to share data and results, a live-chat messager, user and developer guides and FAQs, as well a custom interfaces for the pathogen detection pipelines utilized by the current Pathosphere users. This screenshot displays the user-defined parameters that are customizable for each pathogen detection run
Fig. 2
Fig. 2
Summary of the analytical capability of the bioinformatics pipeline. Data can currently be preprocessed by two tools, Columbia University’s Preprocessing Procedure (CUPP) or a taxonomy analysis based on NCBI taxonomy results. Then, reads retained after the pre-processing manipulations are assembled de novo. Nearest neighbors and SNP profiling then occurs by comparing the identified contigs to NCBI databases. A reference map is created, and the SNP profile from those mapping results provides a comprehensive comparison of the taxonomical near neighbors. Finally, all the unmapped reads are extracted and used as input to the next iteration

Similar articles

Cited by

References

    1. Leopold SR, Goering RV, Witten A, Harmsen D, Mellmann A. Bacterial whole genome sequencing revisited: portable, scalable and standardized analysis for typing and detection of virulence and antibiotic resistance genes. J Clin Microbiol. 2014;52:2365–70. doi: 10.1128/JCM.00262-14. - DOI - PMC - PubMed
    1. Manary MJ, Singhakul SS, Flannery EL, Bopp SE, Corey VC, Bright AT, et al. Identification of pathogen genomic variants through an integrated pipeline. BMC Bioinformatics. 2014;15:63. doi: 10.1186/1471-2105-15-63. - DOI - PMC - PubMed
    1. Naccache SN, Federman S, Veeraraghavan N, Zaharia M, Lee D, Samayoa E, et al. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res. 2014;24:1180–92. doi: 10.1101/gr.171934.113. - DOI - PMC - PubMed
    1. Lipkin WI. The changing face of pathogen discovery and surveillance. Nat Rev Microbiol. 2013;11:133–41. doi: 10.1038/nrmicro2949. - DOI - PMC - PubMed
    1. Deng X, Naccache SN, Ng T, Federman S, Li L, Chiu CY, et al. An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data. Nucleic Acids Res. 2015;43:e46. doi: 10.1093/nar/gkv002. - DOI - PMC - PubMed

Publication types