Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 May;15(3):369-75.
doi: 10.1093/bib/bbt072. Epub 2013 Oct 25.

Sequence analysis by iterated maps, a review

Affiliations
Review

Sequence analysis by iterated maps, a review

Jonas S Almeida. Brief Bioinform. 2014 May.

Abstract

Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.

Keywords: alignment-free; big data; chaos game; iterated maps; mapreduce; sequence analysis.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Sierpinski’s triangle (left) and CGR square (right), both generated by running the IM described in Equation 1 with a set of 20 000 random sets of, respectively, three symbols and four symbols - the edges (circles) of the corresponding figures. The square layout (on the right) is the one proposed in [1] to process genomic sequences, with four possible nucleotides, ACGT, one per edge of the CGR square. Any deviation from the uniform random distribution (any structure in the sequence) will produce a structure in the IM projection that is amenable to alignment-free and scale-free analysis. See Figure 1 of [7] for a graphic illustration of the process of generating CGR coordinates, applied to a gene sequence. The code used to generate this figure is available at http://bit.ly/imfig1.

Similar articles

Cited by

References

    1. Jeffrey HJ. Chaos game representation of gene structure. Nucleic Acids Res. 1990;18:2163–70. - PMC - PubMed
    1. Mandelbrot BB. The Fractal Geometry of Nature. New York: Macmillan; 1983.
    1. Mandelbrot B. How long is the coast of britain? Statistical self-similarity and fractional dimension. Science. 1967;156:636–38. - PubMed
    1. Bak P, Tang C, Wiesenfeld K. Self-organized criticality. Phys Rev A. 1988;38:364–74. - PubMed
    1. Kauffman S. The Origins of Order: Self Organization and Selection in Evolution. New York: Oxford University Press; 1993.

Publication types