Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jun;44(2):408-428.
doi: 10.1239/aap/1339878718.

APPROXIMATE SAMPLING FORMULAS FOR GENERAL FINITE-ALLELES MODELS OF MUTATION

Affiliations

APPROXIMATE SAMPLING FORMULAS FOR GENERAL FINITE-ALLELES MODELS OF MUTATION

Anand Bhaskar et al. Adv Appl Probab. 2012 Jun.

Abstract

Many applications in genetic analyses utilize sampling distributions, which describe the probability of observing a sample of DNA sequences randomly drawn from a population. In the one-locus case with special models of mutation such as the infinite-alleles model or the finite-alleles parent-independent mutation model, closed-form sampling distributions under the coalescent have been known for many decades. However, no exact formula is currently known for more general models of mutation that are of biological interest. In this paper, models with finitely-many alleles are considered, and an urn construction related to the coalescent is used to derive approximate closed-form sampling formulas for an arbitrary irreducible recurrent mutation model or for a reversible recurrent mutation model, depending on whether the number of distinct observed allele types is at most three or four, respectively. It is demonstrated empirically that the formulas derived here are highly accurate when the per-base mutation rate is low, which holds for many biological organisms.

Keywords: Sampling probability; coalescent theory; martingale; urn models.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Error plots as a function of the sample size n, for the transition matrix in (30) and mutation rate θ ∈ {10−3, 5 × 10−3, 10−2}. (a) The expected relative error, AvgErr(n). (b) The worst-case relative error, WorstErr(n).

References

    1. Arratia A, Barbour AD, Tavaré S. Logarithmic Combinatorial Structures: A Probabilistic Approach. Switzerland: European Mathematical Society Publishing House; 2003.
    1. Bhaskar A, Song YS. Closed-form asymptotic sampling distributions under the coalescent with recombination for an arbitrary number of loci. Advances in Applied Probability. 2011 in press. (Preprint arXiv: 1107.4700) - PMC - PubMed
    1. Ewens WJ. The sampling theory of selectively neutral alleles. Theoretical Population Biology. 1972;3:87–112. - PubMed
    1. Fu Y-X. Statistical properties of segregating sites. Theoretical Population Biology. 1995;48:172–197. - PubMed
    1. Griffiths RC. The frequency spectrum of a mutation, and its age, in a general diffusion model. Theoretical Population Biology. 2003;64:241–251. - PubMed

LinkOut - more resources