Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2009 Jan 30;10 Suppl 1(Suppl 1):S13.
doi: 10.1186/1471-2105-10-S1-S13.

DNA motif alignment by evolving a population of Markov chains

Affiliations
Comparative Study

DNA motif alignment by evolving a population of Markov chains

Chengpeng Bi. BMC Bioinformatics. .

Abstract

Background: Deciphering cis-regulatory elements or de novo motif-finding in genomes still remains elusive although much algorithmic effort has been expended. The Markov chain Monte Carlo (MCMC) method such as Gibbs motif samplers has been widely employed to solve the de novo motif-finding problem through sequence local alignment. Nonetheless, the MCMC-based motif samplers still suffer from local maxima like EM. Therefore, as a prerequisite for finding good local alignments, these motif algorithms are often independently run a multitude of times, but without information exchange between different chains. Hence it would be worth a new algorithm design enabling such information exchange.

Results: This paper presents a novel motif-finding algorithm by evolving a population of Markov chains with information exchange (PMC), each of which is initialized as a random alignment and run by the Metropolis-Hastings sampler (MHS). It is progressively updated through a series of local alignments stochastically sampled. Explicitly, the PMC motif algorithm performs stochastic sampling as specified by a population-based proposal distribution rather than individual ones, and adaptively evolves the population as a whole towards a global maximum. The alignment information exchange is accomplished by taking advantage of the pooled motif site distributions. A distinct method for running multiple independent Markov chains (IMC) without information exchange, or dubbed as the IMC motif algorithm, is also devised to compare with its PMC counterpart.

Conclusion: Experimental studies demonstrate that the performance could be improved if pooled information were used to run a population of motif samplers. The new PMC algorithm was able to improve the convergence and outperformed other popular algorithms tested using simulated and biological motif sequences.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Trajectories of MC chains. The trajectories were plotted using the CRP binding data. Both IMC and PMC run the same MHS algorithm r times, but with different strategies as described in the text. (A) IMC with r = 5, (B) PMC, r = 5, (C) PMC, r = 10. Each arrow points to the alignment with the highest likelihood Hmax.
Figure 2
Figure 2
Performance comparison. The CRP binding motifs were planted in simulated background sequences with different lengths. Both IMC and PMC showed the same trend, that is, their performances go down as the sequence becomes longer.

Similar articles

References

    1. Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Science. 2007;447:799–816. - PMC - PubMed
    1. MacIsaac KD, Fraenkel E. Practical strategies for discovering regulatory DNA sequence motifs. PLoS Computat Biol. 2006;2:e36. doi: 10.1371/journal.pcbi.0020026. - DOI - PMC - PubMed
    1. Ji H, Wong WW. Computational biology: Towards deciphering gene regulatory information in mammalian genomes. Biometrics. 2006;62:645–663. doi: 10.1111/j.1541-0420.2006.00625.x. - DOI - PubMed
    1. Wang L, Jiang T. On the complexity of multiple sequence alignment. J Comput Biol. 1994;1:337–348. - PubMed
    1. Lawrence CE, Reilly AA. An expectation maximization algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins: Structure, Function and Genetics. 1990;7:41–51. doi: 10.1002/prot.340070105. - DOI - PubMed

Publication types

LinkOut - more resources