Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002 Jul 31:3:20.
doi: 10.1186/1471-2105-3-20.

Kangaroo--a pattern-matching program for biological sequences

Affiliations

Kangaroo--a pattern-matching program for biological sequences

Doron Betel et al. BMC Bioinformatics. .

Abstract

Background: Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells.

Results: Kangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/.

Conclusion: A low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Kangaroo screenshot. A screenshot of Kangaroo query web interface and results page for a search for potential PDZ binding sites. Top panel shows the Kangaroo user interface. The user can select to match the expression to protein, DNA or coding region sequences from 10 different organisms. The bottom panel shows sample search results. Each hit contains a hyperlink to the full GenBank flatfile record. Note the use of regular expression rules and wildcard character "X" to specify the wide range of potential PDZ binding motifs.

References

    1. Wootton JC, Federhen S. Statistics of local complexity in amino acid sequences and sequence database. Computational Chemistry. 1993;17:149–163. doi: 10.1016/0097-8485(93)85006-X. - DOI
    1. Appel RD, Bairoch A, Hochstrasser DF. A new generation of information retrieval tools for biologists: the example of the ExPASy WWW server. Trends Biochem Sci. 1994;19:258–260. doi: 10.1016/0968-0004(94)90153-8. - DOI - PubMed
    1. Pesole G, Liuni S, D'Souza M. PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance. Bioinformatics. 2000;16:439–450. doi: 10.1093/bioinformatics/16.5.439. - DOI - PubMed
    1. Pesole G, Prunella N, Liuni S, Attimonelli M, Saccone C. WORDUP: an afficient algorithm for discovering statistically significant patterns in DNA sequences. Nucleic Acids Research. 1992;20:2871–2875. - PMC - PubMed
    1. Dsouza M, Larsen N, Overbeek R. Searching for patterns in genomic data. Trends Genet. 1997;13:497–498. doi: 10.1016/S0168-9525(97)01347-4. - DOI - PubMed

Publication types

LinkOut - more resources