Composite likelihood estimation of demographic parameters
- PMID: 19909534
- PMCID: PMC2783031
- DOI: 10.1186/1471-2156-10-72
Composite likelihood estimation of demographic parameters
Abstract
Background: Most existing likelihood-based methods for fitting historical demographic models to DNA sequence polymorphism data to do not scale feasibly up to the level of whole-genome data sets. Computational economies can be achieved by incorporating two forms of pseudo-likelihood: composite and approximate likelihood methods. Composite likelihood enables scaling up to large data sets because it takes the product of marginal likelihoods as an estimator of the likelihood of the complete data set. This approach is especially useful when a large number of genomic regions constitutes the data set. Additionally, approximate likelihood methods can reduce the dimensionality of the data by summarizing the information in the original data by either a sufficient statistic, or a set of statistics. Both composite and approximate likelihood methods hold promise for analyzing large data sets or for use in situations where the underlying demographic model is complex and has many parameters. This paper considers a simple demographic model of allopatric divergence between two populations, in which one of the population is hypothesized to have experienced a founder event, or population bottleneck. A large resequencing data set from human populations is summarized by the joint frequency spectrum, which is a matrix of the genomic frequency spectrum of derived base frequencies in two populations. A Bayesian Metropolis-coupled Markov chain Monte Carlo (MCMCMC) method for parameter estimation is developed that uses both composite and likelihood methods and is applied to the three different pairwise combinations of the human population resequence data. The accuracy of the method is also tested on data sets sampled from a simulated population model with known parameters.
Results: The Bayesian MCMCMC method also estimates the ratio of effective population size for the X chromosome versus that of the autosomes. The method is shown to estimate, with reasonable accuracy, demographic parameters from three simulated data sets that vary in the magnitude of a founder event and a skew in the effective population size of the X chromosome relative to the autosomes. The behavior of the Markov chain is also examined and shown to convergence to its stationary distribution, while also showing high levels of parameter mixing. The analysis of three pairwise comparisons of sub-Saharan African human populations with non-African human populations do not provide unequivocal support for a strong non-African founder event from these nuclear data. The estimates do however suggest a skew in the ratio of X chromosome to autosome effective population size that is greater than one. However in all three cases, the 95% highest posterior density interval for this ratio does include three-fourths, the value expected under an equal breeding sex ratio.
Conclusion: The implementation of composite and approximate likelihood methods in a framework that includes MCMCMC demographic parameter estimation shows great promise for being flexible and computationally efficient enough to scale up to the level of whole-genome polymorphism and divergence analysis. Further work must be done to characterize the effects of the assumption of linkage equilibrium among genomic regions that is crucial to the validity of applying the composite likelihood method.
Figures




Similar articles
-
Computationally Efficient Composite Likelihood Statistics for Demographic Inference.Mol Biol Evol. 2016 Feb;33(2):591-3. doi: 10.1093/molbev/msv255. Epub 2015 Nov 5. Mol Biol Evol. 2016. PMID: 26545922 Free PMC article.
-
Consistency of estimators of population scaled parameters using composite likelihood.J Math Biol. 2006 Nov;53(5):821-41. doi: 10.1007/s00285-006-0031-0. Epub 2006 Sep 8. J Math Biol. 2006. PMID: 16960689
-
Resampling: An improvement of importance sampling in varying population size models.Theor Popul Biol. 2017 Apr;114:70-87. doi: 10.1016/j.tpb.2016.09.002. Epub 2016 Oct 3. Theor Popul Biol. 2017. PMID: 27712980
-
Joint analysis of demography and selection in population genetics: where do we stand and where could we go?Mol Ecol. 2012 Jan;21(1):28-44. doi: 10.1111/j.1365-294X.2011.05308.x. Epub 2011 Oct 14. Mol Ecol. 2012. PMID: 21999307 Review.
-
GADMA2: more efficient and flexible demographic inference from genetic data.Gigascience. 2022 Dec 28;12:giad059. doi: 10.1093/gigascience/giad059. Epub 2023 Aug 23. Gigascience. 2022. PMID: 37609916 Free PMC article. Review.
Cited by
-
Robust demographic inference from genomic and SNP data.PLoS Genet. 2013 Oct;9(10):e1003905. doi: 10.1371/journal.pgen.1003905. Epub 2013 Oct 24. PLoS Genet. 2013. PMID: 24204310 Free PMC article.
-
Modeling SNP array ascertainment with Approximate Bayesian Computation for demographic inference.Sci Rep. 2018 Jul 5;8(1):10209. doi: 10.1038/s41598-018-28539-y. Sci Rep. 2018. PMID: 29977040 Free PMC article.
-
Estimating population divergence time and phylogeny from single-nucleotide polymorphisms data with outgroup ascertainment bias.Mol Ecol. 2012 Feb;21(4):974-86. doi: 10.1111/j.1365-294X.2011.05413.x. Epub 2011 Dec 29. Mol Ecol. 2012. PMID: 22211450 Free PMC article.
-
Estimating parameters of speciation models based on refined summaries of the joint site-frequency spectrum.PLoS One. 2011;6(5):e18155. doi: 10.1371/journal.pone.0018155. Epub 2011 May 26. PLoS One. 2011. PMID: 21637331 Free PMC article.
-
Inference and applications of ancestral recombination graphs.Nat Rev Genet. 2025 Jan;26(1):47-58. doi: 10.1038/s41576-024-00772-4. Epub 2024 Sep 30. Nat Rev Genet. 2025. PMID: 39349760 Review.
References
-
- Lindsay BG. Composite likelihood methods. Contemp Math. 1988;80:221–239.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources