DESCARTES' RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA

Anand Bhaskar¹, Yun S Song¹

Affiliations

PMID: 28018011
PMCID: PMC5175586
DOI: 10.1214/14-AOS1264

DESCARTES' RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA

Anand Bhaskar et al. Ann Stat. 2014.

. 2014;42(6):2469-2493.

doi: 10.1214/14-AOS1264. Epub 2014 Oct 20.

Authors

Anand Bhaskar¹, Yun S Song¹

Affiliation

¹ University of California, Berkeley.

PMID: 28018011
PMCID: PMC5175586
DOI: 10.1214/14-AOS1264

Abstract

The sample frequency spectrum (SFS) is a widely-used summary statistic of genomic variation in a sample of homologous DNA sequences. It provides a highly efficient dimensional reduction of large-scale population genomic data and its mathematical dependence on the underlying population demography is well understood, thus enabling the development of efficient inference algorithms. However, it has been recently shown that very different population demographies can actually generate the same SFS for arbitrarily large sample sizes. Although in principle this nonidentifiability issue poses a thorny challenge to statistical inference, the population size functions involved in the counterexamples are arguably not so biologically realistic. Here, we revisit this problem and examine the identifiability of demographic models under the restriction that the population sizes are piecewise-defined where each piece belongs to some family of biologically-motivated functions. Under this assumption, we prove that the expected SFS of a sample uniquely determines the underlying demographic model, provided that the sample is sufficiently large. We obtain a general bound on the sample size sufficient for identifiability; the bound depends on the number of pieces in the demographic model and also on the type of population size function in each piece. In the cases of piecewise-constant, piecewise-exponential and piecewise-generalized-exponential models, which are often assumed in population genomic inferences, we provide explicit formulas for the bounds as simple functions of the number of pieces. Lastly, we obtain analogous results for the "folded" SFS, which is often used when there is ambiguity as to which allelic type is ancestral. Our results are proved using a generalization of Descartes' rule of signs for polynomials to the Laplace transform of piecewise continuous functions.

Keywords: Population genetics; Primary 62B10; coalescent theory; frequency spectrum; identifiability; population size; secondary 92D15.

PubMed Disclaimer

Figures

**Fig. 1**
A piecewise-exponential population size function η ∈ ℳ_K(ℱ_e), where K ≥ 5. Note that the y-axis is in a log scale. This piecewise-exponential function depicts the historical population size changes of a European population that was estimated from the SFS of a sample of 1351 (diploid) individuals of European ancestry [44].

**Fig. 2**
Illustration of the sign changes of a function. For the domain shown, σ(g) = 3 and the sign change points of g are denoted t₁, t₂, and t₃.

**Fig. 3**
The leading entries of the expected SFS ξ_n for a piecewise-exponential population size model inferred b Tennessen et al. [44]. This demographic model, shown (up to scaling) in Figure 1, was fitted using the observed SFS from a sample of 1351 (diploid) individuals of European ancestry [44]. The blue plot is the expected SFS for n = 19, which matches the sample size bound in Corollary 8 for identifying piecewise-exponential models with up to 5 pieces, while the green plot is the first 18 entries of the expected SFS for n = 2702 (1351 diploids). The red and purple plots are the expected SFS for n = 19 and n = 2702, respectively, for a constant population size function.

See this image and copyright information in PMC

References

1. 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. - PMC - PubMed
1. Boyko AR, Williamson SH, Indap AR, Degenhardt JD, Hernan-dez RD, Lohmueller KE, Adams MD, Schmidt S, Sninsky JJ, Sunyaev SR, et al. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 2008;4:e1000083. - PMC - PubMed
1. Campbell CD, Ogburn EL, Lunetta KL, Lyon HN, Freedman ML, Groop LC, Altshuler D, Ardlie KG, Hirschhorn JN. Demonstrating stratification in a European American population. Nat Genet. 2005;37:868–872. - PubMed
1. Coventry A, Bull-Otterson LM, Liu X, Clark AG, Maxwell TJ, Crosby J, Hixson JE, Rea TJ, Muzny DM, Lewis LR, et al. Deep resequencing reveals excess rare recent variants consistent with explosive population growth. Nature Communications. 2010;1:131. - PMC - PubMed
1. Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, Foll M. Robust demographic inference from genomic and SNP data. PLoS Genet. 2013;9:e1003905. - PMC - PubMed

Grants and funding

R01 GM094402/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

DESCARTES' RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA

Affiliation

DESCARTES' RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA

Authors

Affiliation

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources