Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002 Oct 21:3:28.
doi: 10.1186/1471-2105-3-28.

Efficient Boolean implementation of universal sequence maps (bUSM)

Affiliations

Efficient Boolean implementation of universal sequence maps (bUSM)

John Schwacke et al. BMC Bioinformatics. .

Abstract

Background: Recently, Almeida and Vinga offered a new approach for the representation of arbitrary discrete sequences, referred to as Universal Sequence Maps (USM), and discussed its applicability to genomic sequence analysis. Their work generalizes and extends Chaos Game Representation (CGR) of DNA for arbitrary discrete sequences.

Results: We have considered issues associated with the practical implementation of USMs and offer a variation on the algorithm that: 1) eliminates the overestimation of similar segment lengths, 2) permits the identification of arbitrarily long similar segments in the context of finite word length coordinate representations, 3) uses more computationally efficient operations, and 4) provides a simple conversion for recovering the USM coordinates. Computational performance comparisons and examples are provided.

Conclusions: We have shown that the desirable properties of the USM encoding of nucleotide sequences can be retained in a practical implementation of the algorithm. In addition, the proposed implementation enables determination of local sequence identity at increased speed.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Comparison of encodings for the original sequence, an embedded representation and USM coordinates. Sample encodings for a nucleotide sequence illustrating the equivalence between an embedded encoding and a finite word length, block floating point representation of a standard USM encoding. The values indicated as ii,j represent the initial values of the USM coordinates and the subscripted A, G, T, and C indicate the 0th and 1st bits of the 2-bit USM representation of the associated coordinates.
Figure 2
Figure 2
Comparison of standard and Boolean USM similar segment length measurements. Pixel images of bi-directional distance determination for standard (A) and Boolean (B) USM implementations. Brighter pixels indicate longer similar segments.
Figure 3
Figure 3
Comparison of standard and Boolean USM length measurements for sample nucleotide sequences. Pixel images of bi-directional distance determination for standard (A) and Boolean (B) USM implementations. The sequences are 100 nucleotide segments from the human insulin receptor (INSR) and a chicken tyrosine kinase (CTK-1). Brighter pixels correspond to longer similar segments. The dominant segment is an exact match that is 17 nucleotides long.

References

    1. Jeffrey HJ. Chaos game representation of gene structure. Nucleic Acids Res. 1990;18:2163–2170. - PMC - PubMed
    1. Almeida JS, Carrico JA, Maretzek A, Noble PA, Fletcher M. Analysis of genomic sequences by Chaos Game Representation. Bioinformatics. 2001;17:429–437. doi: 10.1093/bioinformatics/17.5.429. - DOI - PubMed
    1. Tino P. Spatial representation of symbolic sequences through iterative function systems. IEEE Transactions on Systems, Man, and Cybernetics – Part A: Systems and Humans. 1999;29:386–393. doi: 10.1109/3468.769757. - DOI
    1. Almeida JS, Vinga S. Universal sequence map (USM) of arbitrary discrete sequences. BMC Bioinformatics. 2002;3:6. doi: 10.1186/1471-2105-3-6. - DOI - PMC - PubMed
    1. IEEE. 754–1985 IEEE Standard for Binary Floating-Point Arithmetic 1985. Book 754–1985 IEEE Standard for Binary Floating-Point Arithmetic 1985 (Editor ed.^eds.). City. 1985.

Publication types

MeSH terms

LinkOut - more resources