Base compositional structure of genomes
- PMID: 1505943
- DOI: 10.1016/0888-7543(92)90019-o
Base compositional structure of genomes
Abstract
We model the base compositional structure of the human and Escherichia coli genomes. Three particular properties are first quantified: (1) There is a significant tendency for any region of either genome to have a strand-symmetric base composition. (2) The variation in base composition from region to region, within each genome, is very much larger than expected from common homogeneous stochastic models. (3) A given local base composition tends to persist over a scale of at least kilobases (E. coli) or tens of kilobases (human). Multidomain stochastic models from the literature are reviewed and sharpened. In particular, quantitative measurements of the third property lead us to suggest a significant shift in the style of domain models, in which the variation of A+T content with position is modeled by a random walk with frequent small steps rather than with large quantum jumps. As an application, we suggest a way to reduce the amount of computation in the assembly of large sequences from sequences of randomly chosen fragments.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Research Materials