Stochastic models for heterogeneous DNA sequences
- PMID: 2706403
- DOI: 10.1007/BF02458837
Stochastic models for heterogeneous DNA sequences
Abstract
The composition of naturally occurring DNA sequences is often strikingly heterogeneous. In this paper, the DNA sequence is viewed as a stochastic process with local compositional properties determined by the states of a hidden Markov chain. The model used is a discrete-state, discrete-outcome version of a general model for non-stationary time series proposed by Kitagawa (1987). A smoothing algorithm is described which can be used to reconstruct the hidden process and produce graphic displays of the compositional structure of a sequence. The problem of parameter estimation is approached using likelihood methods and an EM algorithm for approximating the maximum likelihood estimate is derived. The methods are applied to sequences from yeast mitochondrial DNA, human and mouse mitochondrial DNAs, a human X chromosomal fragment and the complete genome of bacteriophage lambda.
Similar articles
-
Drifting Markov models with polynomial drift and applications to DNA sequences.Stat Appl Genet Mol Biol. 2008;7(1):Article6. doi: 10.2202/1544-6115.1326. Epub 2008 Feb 21. Stat Appl Genet Mol Biol. 2008. PMID: 18312211
-
Bayesian restoration of a hidden Markov chain with applications to DNA sequencing.J Comput Biol. 1999 Summer;6(2):261-77. doi: 10.1089/cmb.1999.6.261. J Comput Biol. 1999. PMID: 10421527
-
Estimation and reliability of molecular sequence alignments.Biometrics. 1995 Mar;51(1):100-13. Biometrics. 1995. PMID: 7766767
-
Integrating database homology in a probabilistic gene structure model.Pac Symp Biocomput. 1997:232-44. Pac Symp Biocomput. 1997. PMID: 9390295
-
Statistical alignment with a sequence evolution model allowing rate heterogeneity along the sequence.IEEE/ACM Trans Comput Biol Bioinform. 2009 Apr-Jun;6(2):281-95. doi: 10.1109/TCBB.2007.70246. IEEE/ACM Trans Comput Biol Bioinform. 2009. PMID: 19407352
Cited by
-
A compositional segmentation of the human mitochondrial genome is related to heterogeneities in the guanine mutation rate.Nucleic Acids Res. 2003 Oct 15;31(20):6043-52. doi: 10.1093/nar/gkg784. Nucleic Acids Res. 2003. PMID: 14530452 Free PMC article.
-
Apollo: a sequence annotation editor.Genome Biol. 2002;3(12):RESEARCH0082. doi: 10.1186/gb-2002-3-12-research0082. Epub 2002 Dec 23. Genome Biol. 2002. PMID: 12537571 Free PMC article. Review.
-
Detection of transposable elements by their compositional bias.BMC Bioinformatics. 2004 Jul 13;5:94. doi: 10.1186/1471-2105-5-94. BMC Bioinformatics. 2004. PMID: 15251040 Free PMC article.
-
A hidden Markov model that finds genes in E. coli DNA.Nucleic Acids Res. 1994 Nov 11;22(22):4768-78. doi: 10.1093/nar/22.22.4768. Nucleic Acids Res. 1994. PMID: 7984429 Free PMC article.
-
Interpreting genomic data via entropic dissection.Nucleic Acids Res. 2013 Jan 7;41(1):e23. doi: 10.1093/nar/gks917. Epub 2012 Oct 3. Nucleic Acids Res. 2013. PMID: 23036836 Free PMC article.
References
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources