Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 1989;51(1):5-37.
doi: 10.1007/BF02458834.

Approximate matching of regular expressions

Approximate matching of regular expressions

E W Myers et al. Bull Math Biol. 1989.

Abstract

Given a sequence A and regular expression R, the approximate regular expression matching problem is to find a sequence matching R whose optimal alignment with A is the highest scoring of all such sequences. This paper develops an algorithm to solve the problem in time O(MN), where M and N are the lengths of A and R. Thus, the time requirement is asymptotically no worse than for the simpler problem of aligning two fixed sequences. Our method is superior to an earlier algorithm by Wagner and Seiferas in several ways. First, it treats real-valued costs, in addition to integer costs, with no loss of asymptotic efficiency. Second, it requires only O(N) space to deliver just the score of the best alignment. Finally, its structure permits implementation techniques that make it extremely fast in practice. We extend the method to accommodate gap penalties, as required for typical applications in molecular biology, and further refine it to search for sub-strings of A that strongly align with a sequence in R, as required for typical data base searches. We also show how to deliver an optimal alignment between A and R in only O(N + log M) space using O(MN log M) time. Finally, an O(MN(M + N) + N2log N) time algorithm is presented for alignment scoring schemes where the cost of a gap is an arbitrary increasing function of its length.

PubMed Disclaimer

References

    1. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):263-80 - PubMed
    1. J Mol Biol. 1982 Dec 15;162(3):705-8 - PubMed
    1. Biochemistry. 1986 Jan 14;25(1):266-75 - PubMed
    1. Comput Appl Biosci. 1988 Mar;4(1):11-7 - PubMed
    1. Proc Natl Acad Sci U S A. 1983 Mar;80(5):1382-6 - PubMed

Publication types

LinkOut - more resources