APPROXIMATE SAMPLING FORMULAS FOR GENERAL FINITE-ALLELES MODELS OF MUTATION
- PMID: 24634516
- PMCID: PMC3953561
- DOI: 10.1239/aap/1339878718
APPROXIMATE SAMPLING FORMULAS FOR GENERAL FINITE-ALLELES MODELS OF MUTATION
Abstract
Many applications in genetic analyses utilize sampling distributions, which describe the probability of observing a sample of DNA sequences randomly drawn from a population. In the one-locus case with special models of mutation such as the infinite-alleles model or the finite-alleles parent-independent mutation model, closed-form sampling distributions under the coalescent have been known for many decades. However, no exact formula is currently known for more general models of mutation that are of biological interest. In this paper, models with finitely-many alleles are considered, and an urn construction related to the coalescent is used to derive approximate closed-form sampling formulas for an arbitrary irreducible recurrent mutation model or for a reversible recurrent mutation model, depending on whether the number of distinct observed allele types is at most three or four, respectively. It is demonstrated empirically that the formulas derived here are highly accurate when the per-base mutation rate is low, which holds for many biological organisms.
Keywords: Sampling probability; coalescent theory; martingale; urn models.
Figures
References
-
- Arratia A, Barbour AD, Tavaré S. Logarithmic Combinatorial Structures: A Probabilistic Approach. Switzerland: European Mathematical Society Publishing House; 2003.
-
- Ewens WJ. The sampling theory of selectively neutral alleles. Theoretical Population Biology. 1972;3:87–112. - PubMed
-
- Fu Y-X. Statistical properties of segregating sites. Theoretical Population Biology. 1995;48:172–197. - PubMed
-
- Griffiths RC. The frequency spectrum of a mutation, and its age, in a general diffusion model. Theoretical Population Biology. 2003;64:241–251. - PubMed
Grants and funding
LinkOut - more resources
Full Text Sources