A Faster and More Accurate Algorithm for Calculating Population Genetics Statistics Requiring Sums of Stirling Numbers of the First Kind
- PMID: 32900901
- PMCID: PMC7642932
- DOI: 10.1534/g3.120.401575
A Faster and More Accurate Algorithm for Calculating Population Genetics Statistics Requiring Sums of Stirling Numbers of the First Kind
Abstract
Ewen's sampling formula is a foundational theoretical result that connects probability and number theory with molecular genetics and molecular evolution; it was the analytical result required for testing the neutral theory of evolution, and has since been directly or indirectly utilized in a number of population genetics statistics. Ewen's sampling formula, in turn, is deeply connected to Stirling numbers of the first kind. Here, we explore the cumulative distribution function of these Stirling numbers, which enables a single direct estimate of the sum, using representations in terms of the incomplete beta function. This estimator enables an improved method for calculating an asymptotic estimate for one useful statistic, Fu's [Formula: see text] By reducing the calculation from a sum of terms involving Stirling numbers to a single estimate, we simultaneously improve accuracy and dramatically increase speed.
Keywords: Asymptotic analysis; Cumulative distribution functions; Evolutionary inference; Numerical algorithms; Population genetics statistics; Stirling numbers of the first kind.
Copyright © 2020 Chen, Temme.
Figures




Similar articles
-
Implementation of a Stirling number estimator enables direct calculation of population genetics tests for large sequence datasets.Bioinformatics. 2019 Aug 1;35(15):2668-2670. doi: 10.1093/bioinformatics/bty1012. Bioinformatics. 2019. PMID: 30541067
-
Estimating effective population size from temporally spaced samples with a novel, efficient maximum-likelihood algorithm.Genetics. 2015 May;200(1):285-93. doi: 10.1534/genetics.115.174904. Epub 2015 Mar 5. Genetics. 2015. PMID: 25747459 Free PMC article.
-
On the number of siblings and p-th cousins in a large population sample.J Math Biol. 2018 Nov;77(5):1279-1298. doi: 10.1007/s00285-018-1252-8. Epub 2018 Jun 6. J Math Biol. 2018. PMID: 29876645
-
An algorithm for computing the gene tree probability under the multispecies coalescent and its application in the inference of population tree.Bioinformatics. 2016 Jun 15;32(12):i225-i233. doi: 10.1093/bioinformatics/btw261. Bioinformatics. 2016. PMID: 27307621 Free PMC article.
-
New Software for the Fast Estimation of Population Recombination Rates (FastEPRR) in the Genomic Era.G3 (Bethesda). 2016 Jun 1;6(6):1563-71. doi: 10.1534/g3.116.028233. G3 (Bethesda). 2016. PMID: 27172192 Free PMC article.
References
-
- Crane H., 2016. Rejoinder: The ubiquitous Ewens sampling formula. Stat. Sci. 31: 37–39. 10.1214/15-STS544 - DOI
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources