Efficient large-scale sequence comparison by locality-sensitive hashing

J Buhler¹

Affiliations

PMID: 11331236
DOI: 10.1093/bioinformatics/17.5.419

Comparative Study

Efficient large-scale sequence comparison by locality-sensitive hashing

J Buhler. Bioinformatics. 2001 May.

. 2001 May;17(5):419-28.

doi: 10.1093/bioinformatics/17.5.419.

Author

J Buhler¹

Affiliation

¹ Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195-2350, USA. jbuhler@cs.washington.edu

PMID: 11331236
DOI: 10.1093/bioinformatics/17.5.419

Abstract

Motivation: Comparison of multimegabase genomic DNA sequences is a popular technique for finding and annotating conserved genome features. Performing such comparisons entails finding many short local alignments between sequences up to tens of megabases in length. To process such long sequences efficiently, existing algorithms find alignments by expanding around short runs of matching bases with no substitutions or other differences. Unfortunately, exact matches that are short enough to occur often in significant alignments also occur frequently by chance in the background sequence. Thus, these algorithms must trade off between efficiency and sensitivity to features without long exact matches.

Results: We introduce a new algorithm, LSH-ALL-PAIRS, to find ungapped local alignments in genomic sequence with up to a specified fraction of substitutions. The length and substitution rate of these alignments can be chosen so that they appear frequently in significant similarities yet still remain rare in the background sequence. The algorithm finds ungapped alignments efficiently using a randomized search technique, locality-sensitive hashing. We have found LSH-ALL-PAIRS to be both efficient and sensitive for finding local similarities with as little as 63% identity in mammalian genomic sequences up to tens of megabases in length

PubMed Disclaimer

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Ovid Technologies, Inc.
- Silverchair Information Systems
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Efficient large-scale sequence comparison by locality-sensitive hashing

Affiliation

Efficient large-scale sequence comparison by locality-sensitive hashing

Author

Affiliation

Abstract

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources