Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Apr 15;28(8):1174-5.
doi: 10.1093/bioinformatics/bts100. Epub 2012 Feb 28.

Rapid identification of non-human sequences in high-throughput sequencing datasets

Affiliations

Rapid identification of non-human sequences in high-throughput sequencing datasets

Aparna Bhaduri et al. Bioinformatics. .

Abstract

Rapid identification of non-human sequences (RINS) is an intersection-based pathogen detection workflow that utilizes a user-provided custom reference genome set for identification of non-human sequences in deep sequencing datasets. In <2 h, RINS correctly identified the known virus in the dataset SRR73726 and is compatible with any computer capable of running the prerequisite alignment and assembly programs. RINS accurately identifies sequencing reads from intact or mutated non-human genomes in a dataset and robustly generates contigs with these non-human sequences (Supplementary Material).

Availability: RINS is available for free download at http://khavarilab.stanford.edu/resources.html.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
RINS uses intersection (marked by asterisks), not subtraction, to identify non-human reads. The workflow intersects the reads in the dataset with a reference of non-human genomes of interest using Blat to align non-overlapping 25 mers for each read. Reads with >80% homology are aligned to the human genome and reads with >97% homology are removed from the read set. Remaining reads are complexity filtered with an LZW compression ratio of 0.50 and mate pairs for sufficiently complex reads are identified. This read set is then assembled into pathogen sequence contigs.

Similar articles

Cited by

References

    1. Benson D.A., et al. GenBank. Nucl. Acids Res. 2008;36:D25–D30. - PMC - PubMed
    1. Feng H., et al. Clonal integration of a polyomavirus in human Merkel cell carcinoma. Science. 2008;319:1096–1100. - PMC - PubMed
    1. Grabherr M.G., et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 2011;29:644–652. - PMC - PubMed
    1. Kent W.J. BLAT–the BLAST-like alignment tool. Genome Res. 2002;12:656–664. - PMC - PubMed
    1. Kostic A.D., et al. PathSeq: software to identify or discover microbes by deep sequencing of human tissue. Nat. Biotechnol. 2011;29:393–396. - PMC - PubMed

Publication types