Fast multiple alignment of ungapped DNA sequences using information theory and a relaxation method
- PMID: 19953199
- PMCID: PMC2785095
- DOI: 10.1016/S0166-218X(96)00068-6
Fast multiple alignment of ungapped DNA sequences using information theory and a relaxation method
Abstract
An information theory based multiple alignment ("Malign") method was used to align the DNA binding sequences of the OxyR and Fis proteins, whose sequence conservation is so spread out that it is difficult to identify the sites. In the algorithm described here, the information content of the sequences is used as a unique global criterion for the quality of the alignment. The algorithm uses look-up tables to avoid recalculating computationally expensive functions such as the logarithm. Because there are no arbitrary constants and because the results are reported in absolute units (bits), the best alignment can be chosen without ambiguity. Starting from randomly selected alignments, a hill-climbing algorithm can track through the immense space of s(n) combinations where s is the number of sequences and n is the number of positions possible for each sequence. Instead of producing a single alignment, the algorithm is fast enough that one can afford to use many start points and to classify the solutions. Good convergence is indicated by the presence of a single well-populated solution class having higher information content than other classes. The existence of several distinct classes for the Fis protein indicates that those binding sites have self-similar features.
Figures
Similar articles
-
Accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points.Bioinformatics. 2019 Jan 15;35(2):211-218. doi: 10.1093/bioinformatics/bty592. Bioinformatics. 2019. PMID: 29992260 Free PMC article.
-
Using CLUSTAL for multiple sequence alignments.Methods Enzymol. 1996;266:383-402. doi: 10.1016/s0076-6879(96)66024-8. Methods Enzymol. 1996. PMID: 8743695
-
SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1. Syst Biol. 2012. PMID: 22139466
-
From analysis of protein structural alignments toward a novel approach to align protein sequences.Proteins. 2004 Feb 15;54(3):569-82. doi: 10.1002/prot.10503. Proteins. 2004. PMID: 14748004
-
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217. Cochrane Database Syst Rev. 2022. PMID: 36321557 Free PMC article.
Cited by
-
Information theory tests critical predictions of plant defense theory for specialized metabolism.Sci Adv. 2020 Jun 10;6(24):eaaz0381. doi: 10.1126/sciadv.aaz0381. eCollection 2020 Jun. Sci Adv. 2020. PMID: 32577508 Free PMC article.
-
Discovery of novel tumor suppressor p53 response elements using information theory.Nucleic Acids Res. 2008 Jun;36(11):3828-33. doi: 10.1093/nar/gkn189. Epub 2008 May 21. Nucleic Acids Res. 2008. PMID: 18495754 Free PMC article.
-
Molecular flip-flops formed by overlapping Fis sites.Nucleic Acids Res. 2003 Nov 15;31(22):6663-73. doi: 10.1093/nar/gkg877. Nucleic Acids Res. 2003. PMID: 14602927 Free PMC article.
-
Consensus sequence Zen.Appl Bioinformatics. 2002;1(3):111-9. Appl Bioinformatics. 2002. PMID: 15130839 Free PMC article. Review.
-
Bipartite pattern discovery by entropy minimization-based multiple local alignment.Nucleic Acids Res. 2004 Sep 23;32(17):4979-91. doi: 10.1093/nar/gkh825. Print 2004. Nucleic Acids Res. 2004. PMID: 15388800 Free PMC article.
References
-
- Barber AM, Zhurkin VB. CAP binding sites reveal pyrimidine-purine pattern characteristic of DNA bending. J Biomol Struct Dyn. 1990;8:213–232. - PubMed
-
- Chan SC, Wong AKC, Chiu DKY. A survey of multiple sequence comparison methods. Bull of Math Biol. 1992;54:563–598. - PubMed
-
- Finkel SE, Johnson RC. The Fis protein: it’s not just for DNA inversion anymore. Mol Microbiol. 1992;6:3257–3265. - PubMed
-
- Finkel SE, Johnson RC. The Fis protein: it’s not just for DNA inversion anymore (erratum) Mol Microbiol. 1992;6:1023. - PubMed
Grants and funding
LinkOut - more resources
Full Text Sources