Rapid identification of non-human sequences in high-throughput sequencing datasets

Aparna Bhaduri¹, Kun Qu, Carolyn S Lee, Alexander Ungewickell, Paul A Khavari

Affiliations

PMID: 22377895
PMCID: PMC3324519
DOI: 10.1093/bioinformatics/bts100

Rapid identification of non-human sequences in high-throughput sequencing datasets

Aparna Bhaduri et al. Bioinformatics. 2012.

. 2012 Apr 15;28(8):1174-5.

doi: 10.1093/bioinformatics/bts100. Epub 2012 Feb 28.

Authors

Aparna Bhaduri¹, Kun Qu, Carolyn S Lee, Alexander Ungewickell, Paul A Khavari

Affiliation

¹ Veterans Affairs Palo Alto Healthcare System, Palo Alto, CA 94304, USA. abhaduri@stanford.edu

PMID: 22377895
PMCID: PMC3324519
DOI: 10.1093/bioinformatics/bts100

Abstract

Rapid identification of non-human sequences (RINS) is an intersection-based pathogen detection workflow that utilizes a user-provided custom reference genome set for identification of non-human sequences in deep sequencing datasets. In <2 h, RINS correctly identified the known virus in the dataset SRR73726 and is compatible with any computer capable of running the prerequisite alignment and assembly programs. RINS accurately identifies sequencing reads from intact or mutated non-human genomes in a dataset and robustly generates contigs with these non-human sequences (Supplementary Material).

Availability: RINS is available for free download at http://khavarilab.stanford.edu/resources.html.

PubMed Disclaimer

Figures

**Fig. 1.**
RINS uses intersection (marked by asterisks), not subtraction, to identify non-human reads. The workflow intersects the reads in the dataset with a reference of non-human genomes of interest using Blat to align non-overlapping 25 mers for each read. Reads with >80% homology are aligned to the human genome and reads with >97% homology are removed from the read set. Remaining reads are complexity filtered with an LZW compression ratio of 0.50 and mate pairs for sufficiently complex reads are identified. This read set is then assembled into pathogen sequence contigs.

See this image and copyright information in PMC

References

1. Benson D.A., et al. GenBank. Nucl. Acids Res. 2008;36:D25–D30. - PMC - PubMed
1. Feng H., et al. Clonal integration of a polyomavirus in human Merkel cell carcinoma. Science. 2008;319:1096–1100. - PMC - PubMed
1. Grabherr M.G., et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 2011;29:644–652. - PMC - PubMed
1. Kent W.J. BLAT–the BLAST-like alignment tool. Genome Res. 2002;12:656–664. - PMC - PubMed
1. Kostic A.D., et al. PathSeq: software to identify or discover microbes by deep sequencing of human tissue. Nat. Biotechnol. 2011;29:393–396. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Rapid identification of non-human sequences in high-throughput sequencing datasets

Affiliation

Rapid identification of non-human sequences in high-throughput sequencing datasets

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources