Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009:8:Article 6.
doi: 10.2202/1544-6115.1391. Epub 2009 Jan 28.

Composite likelihood modeling of neighboring site correlations of DNA sequence substitution rates

Affiliations

Composite likelihood modeling of neighboring site correlations of DNA sequence substitution rates

Ling Deng et al. Stat Appl Genet Mol Biol. 2009.

Abstract

Sequence data from a series of homologous DNA segments from related organisms are typically polymorphic at many sites, and these polymorphisms are the result of evolutionary processes. Such data may be used to estimate the substitution rates as well as the variability of these rates. Careful characterization of the distribution of this variation is essential for accurate estimation of evolutionary distances and phylogeny reconstruction among these sequences. Many researchers have recognized the importance of the variability of substitution rates, which most have modeled using a discrete gamma distribution. Some have extended these methods to explicitly account for the correlation of substitution rates among sites using hidden Markov models; others have proposed context-dependent substitution rate schemes. We accommodate these correlations using a composite likelihood method based on a bivariate gamma distribution, which is more flexible than hidden Markov models in terms of correlation structure and more computationally tractable compared to the context-dependent schemes. We show that the estimates have good theoretical properties. We also use simulations to compare the maximum composite likelihood estimates to those obtained from maximum likelihood based on the independence assumption. We use data from the mitochondrial DNA of ten primates to obtain maximum composite likelihood estimates of the mean substitution rate, overdispersion, and correlation parameters, and use these estimates in a parametric phylogenetic bootstrap to assess the impact of serial correlation on the estimates of substitution rates and branch lengths.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Phylogenetic relationship among 10 primates based on ND5 Mitochondrial DNA sequences (Codon position 2). In the text, we refer to Clade 1 as the subtree consisted of human, chimpanzee, and pygmy chimpanzee, and Clade 2 as the subtree consisted of gibbon, barbary ape, hamadrya baboon, lemur, and western tarsier.
Figure 2
Figure 2
Number of substitutions vs. site number DNA codon position 2 of ND5, 10 primates.
Figure 3
Figure 3
Correlation vs. the number of sites apart DNA codon position 2 of ND5, 10 primates.

References

    1. Cox DR, Reid N. A note on pseudolikelihood constructed from marginal densities. Biometrika. 2004;91:729–737.
    1. Efron B, Halloran E, Holmes S. Bootstrap confidence levels for phylogenetic trees, correction. Proc Natl Acad Sci USA. 1996;93:13429–13434. - PMC - PubMed
    1. Excoffier L, Yang Z. Substitution rate variation among sites in mitochondrial hypervariable region I of humans and chimpanzees. Molec Biol Evol. 1999;16:1357–1368. - PubMed
    1. Fearnhead P. Consistency of estimators of the population-scaled recombination rate. Theoret Pop Biol. 2003;64:67–79. - PubMed
    1. Fearnhead P, Donnelly P. Approximate likelihood methods for estimating local recombination rates (with discussion) J Roy Statist Soc Ser B. 2002;64:657–680.