Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024;119(547):2242-2255.
doi: 10.1080/01621459.2023.2252570. Epub 2023 Oct 3.

Exact Decoding of a Sequentially Markov Coalescent Model in Genetics

Affiliations

Exact Decoding of a Sequentially Markov Coalescent Model in Genetics

Caleb Ki et al. J Am Stat Assoc. 2024.

Abstract

In statistical genetics, the sequentially Markov coalescent (SMC) is an important family of models for approximating the distribution of genetic variation data under complex evolutionary models. Methods based on SMC are widely used in genetics and evolutionary biology, with significant applications to genotype phasing and imputation, recombination rate estimation, and inferring population history. SMC allows for likelihood-based inference using hidden Markov models (HMMs), where the latent variable represents a genealogy. Because genealogies are continuous, while HMMs are discrete, SMC requires discretizing the space of trees in a way that is awkward and creates bias. In this work, we propose a method that circumvents this requirement, enabling SMC-based inference to be performed in the natural setting of a continuous state space. We derive fast, exact procedures for frequentist and Bayesian inference using SMC. Compared to existing methods, ours requires minimal user intervention or parameter tuning, no numerical optimization or E-M, and is faster and more accurate.

Keywords: changepoint; coalescent; hidden Markov model; population genetics.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Comparison of XSMC, PSMC, SMCSMC, SMC++ on various simulated size histories.
Fig. 2
Fig. 2
Result of fitting XSMC to 1000 Genomes data. For each superpopulation, 20 samples were chosen. Solid line denotes the median across all samples, and shaded bands denote the interquartile range.

Similar articles

Cited by

References

    1. Adrion JR, Cole CB, Dukler N, Galloway JG, Gladstein AL, Gower G, Kyriazis CC, Ragsdale AP, Tsambos G, Baumdicker F, Carlson J, Cartwright RA, Durvasula A, Kim BY, McKenzie P, Messer PW, Noskova E, Vecchyo DO-D, Racimo F, Struck TJ, Gravel S, Gutenkunst RN, Lohmeuller KE, Ralph PL, Schrider DR, Siepel A, Kelleher J, and Kern AD (2019), “A community-maintained standard library of population genetic models,” bioRxiv,. - PMC - PubMed
    1. Barry D, and Hartigan JA (1992), “Product partition models for change point problems,” The Annals of Statistics, pp. 260–279.
    1. Barry D, and Hartigan JA (1993), “A Bayesian analysis for change point problems,” Journal of the American Statistical Association, 88(421), 309–319.
    1. Bhaskar A, Wang YXR, and Song YS (2015), “Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data,” Genome Research, 25(2), 268–279. - PMC - PubMed
    1. Bishop CM (2006), Pattern Recognition and Machine Learning, Berlin, Heidelberg: Springer-Verlag.

LinkOut - more resources