The breakdown of the word symmetry in the human genome
- PMID: 23831271
- DOI: 10.1016/j.jtbi.2013.06.032
The breakdown of the word symmetry in the human genome
Abstract
Previous studies have suggested that Chargaff's second rule may hold for relatively long words (above 10nucleotides), but this has not been conclusively shown. In particular, the following questions remain open: Is the phenomenon of symmetry statistically significant? If so, what is the word length above which significance is lost? Can deviations in symmetry due to the finite size of the data be identified? This work addresses these questions by studying word symmetries in the human genome, chromosomes and transcriptome. To rule out finite-length effects, the results are compared with those obtained from random control sequences built to satisfy Chargaff's second parity rule. We use several techniques to evaluate the phenomenon of symmetry, including Pearson's correlation coefficient, total variational distance, a novel word symmetry distance, as well as traditional and equivalence statistical tests. We conclude that word symmetries are statistical significant in the human genome for word lengths up to 6nucleotides. For longer words, we present evidence that the phenomenon may not be as prevalent as previously thought.
Keywords: Equivalence testing; Oligonucleotide composition; Single strand symmetry; Word symmetry distance.
© 2013 Elsevier Ltd. All rights reserved.
Comment in
-
Persistence and breakdown of strand symmetry in the human genome.J Theor Biol. 2015 Apr 7;370:202-4. doi: 10.1016/j.jtbi.2014.12.014. Epub 2015 Jan 6. J Theor Biol. 2015. PMID: 25576243
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
