Over- and underrepresentation of short DNA words in herpesvirus genomes
- PMID: 8891954
- PMCID: PMC4076300
- DOI: 10.1089/cmb.1996.3.345
Over- and underrepresentation of short DNA words in herpesvirus genomes
Abstract
The relative abundance and rarity of DNA words have been recognized in previous biological studies to have implications for the regulation, repair, and evolutionary mechanisms of a genome. In this paper, we review several different measures of abundance and rarity of DNA words, including z-scores, representation ratios, and cross-ratios, that have appeared in the recent literature, and examine the concordance among them using the human cytomegalovirus genome sequence. We then rank all words of length k = 2, ..., 5 of seven herpesvirus genomes according to their abundance, as measured by one of the z-scores based upon a stationary Markov model of order k-2. Using a simple metric on the ranks of 2-words of the seven herpesvirus sequences, we construct an evolutionary tree. Several 3-words are observed to be consistently over- or underrepresented in all seven herpesviruses. Furthermore, clusters of some of the most over- and underrepresented 4- and 5-words in the genomes are identified with functional sites such as the origins of replication and regulatory signals of individual viruses.
Figures






Comment in
-
An efficient statistic to detect over- and under-represented words in DNA sequences.J Comput Biol. 1997 Summer;4(2):189-92. doi: 10.1089/cmb.1997.4.189. J Comput Biol. 1997. PMID: 9228617
Similar articles
-
Nonrandom clusters of palindromes in herpesvirus genomes.J Comput Biol. 2005 Apr;12(3):331-54. doi: 10.1089/cmb.2005.12.331. J Comput Biol. 2005. PMID: 15857246 Free PMC article. Review.
-
Scoring schemes of palindrome clusters for more sensitive prediction of replication origins in herpesviruses.Nucleic Acids Res. 2005 Sep 1;33(15):e134. doi: 10.1093/nar/gni135. Nucleic Acids Res. 2005. PMID: 16141192 Free PMC article.
-
Short nucleotide sequences in herpesviral genomes identical to the human DNA.J Theor Biol. 2015 May 7;372:12-21. doi: 10.1016/j.jtbi.2015.02.019. Epub 2015 Feb 26. J Theor Biol. 2015. PMID: 25728788
-
Genome-wide analysis of G-quadruplexes in herpesvirus genomes.BMC Genomics. 2016 Nov 21;17(1):949. doi: 10.1186/s12864-016-3282-1. BMC Genomics. 2016. PMID: 27871228 Free PMC article.
-
Interactions between the transcription and replication machineries regulate the RNA and DNA synthesis in the herpesviruses.Virus Genes. 2019 Jun;55(3):274-279. doi: 10.1007/s11262-019-01643-5. Epub 2019 Feb 14. Virus Genes. 2019. PMID: 30767118 Free PMC article. Review.
Cited by
-
APOBEC3 has not left an evolutionary footprint on the HIV-1 genome.J Virol. 2011 Sep;85(17):9139-46. doi: 10.1128/JVI.00658-11. Epub 2011 Jun 22. J Virol. 2011. PMID: 21697498 Free PMC article.
-
Evolutionary implications of microbial genome tetranucleotide frequency biases.Genome Res. 2003 Feb;13(2):145-58. doi: 10.1101/gr.335003. Genome Res. 2003. PMID: 12566393 Free PMC article.
-
Nonrandom clusters of palindromes in herpesvirus genomes.J Comput Biol. 2005 Apr;12(3):331-54. doi: 10.1089/cmb.2005.12.331. J Comput Biol. 2005. PMID: 15857246 Free PMC article. Review.
-
Mining protein loops using a structural alphabet and statistical exceptionality.BMC Bioinformatics. 2010 Feb 4;11:75. doi: 10.1186/1471-2105-11-75. BMC Bioinformatics. 2010. PMID: 20132552 Free PMC article.
-
Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data.Algorithms Mol Biol. 2010 Jan 26;5:15. doi: 10.1186/1748-7188-5-15. Algorithms Mol Biol. 2010. PMID: 20205909 Free PMC article.
References
-
- Agresti A. Categorical Data Analysis. John Wiley; New York: 1990.
-
- Billingsley P. Probability and Measure. 3. John Wiley; New York: 1995.
-
- Blaisdell BE. Markov Chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding. J Mol Evol. 1985;21:278–288. - PubMed
-
- Brendel V, Beckmann JS, Trifonov EN. Linguistics of nucleotide sequences: Morphology and comparison of vocabularies. J Biomol Struct Dyn. 1986;4(1):11–21. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources