Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 1;16(1):6054.
doi: 10.1038/s41467-025-61378-w.

Accounting for population structure and data quality in demographic inference with linkage disequilibrium methods

Affiliations

Accounting for population structure and data quality in demographic inference with linkage disequilibrium methods

Enrique Santiago et al. Nat Commun. .

Abstract

Linkage disequilibrium methods for demographic inference usually rely on panmictic population models. However, the structure of natural populations is generally complex and the quality of the genotyping data is often suboptimal. We present two software tools that implement theoretical developments to estimate the effective population size (Ne): GONE2, for inferring recent changes in Ne when a genetic map is available, and currentNe2, which estimates contemporary Ne even in the absence of genetic maps. These tools operate on SNP data from a single sample of individuals, and provide insights into population structure, including the FST index, migration rate, and subpopulation number. GONE2 can also handle haploid data, genotyping errors, and low sequencing depth data. Results from simulations and laboratory populations of Drosophila melanogaster validated the tools in different demographic scenarios, and analysis were extended to populations of several species. These results highlight that ignoring population subdivision often leads to Ne underestimation.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. currentNe2 estimates in different simulated scenarios.
The top plot in each panel shows estimates of contemporary effective population size (Ne) for the entire metapopulation (blue dots), the sum of the effective sizes of the subpopulations (NT, red dots), and Ne assuming panmixia (green squares; green arrows indicate values below the scale limit). Note that the labels ‘N’ on the x-axes here refer generically to the three estimates. The three lower plots show estimates for migration rates (m), genetic differentiation indices (FST), and the number of subpopulations (s). All estimates were calculated using the currentNe2 under the assumption of population subdivision (option ‘-x’), except for Ne under panmixia. Each estimate represents the geometric mean of ten simulated metapopulations, with 95% confidence intervals. True metapopulation values are indicated by grey dashed lines (Ne and NT lines often overlap, with Ne typically higher). A Effect of varying migration rates in a metapopulation with two subpopulations of 1000 individuals each. B Effect of increasing the number of equally sized subpopulations while maintaining a total metapopulation size of 2000 individuals (m = 0.001). C Effect of unequal sampling proportions between two subpopulations of 1000 individuals each (m = 0.001). D Effect of unequal subpopulation sizes in a metapopulation with two subpopulations totalling 2000 individuals (m = 0.001). All estimates are based on random samples of 100 individuals from the entire metapopulation, except in (C), where sampling proportions vary. Source data are provided as a Source Data file.
Fig. 2
Fig. 2. GONE2 estimates of the historical Ne in different simulated scenarios.
Four different profiles of demographic change are represented in the columns by the grey-shaded areas: constant population size, recent decline, recent expansion, and bottleneck. The scenarios are also arranged in rows in four different situations. First row: Estimates from panmictic populations using the default option of the program (green lines) and the ‘-x’ option, which incorrectly assumes that the population is divided in two subpopulations (red lines). Second row: Estimates from metapopulations composed of two subpopulations of equal size N and migration rate m = 0.001 per generation. The analyses consider either a metapopulation divided into two subpopulations (option ‘-x’, red lines) or the incorrect assumption of a single panmictic population (default program options, green lines). Third row: Estimates from low depth of sequencing data (average depth 2x) of panmictic populations. Green lines show estimates not corrected for low sequencing depth (default program options), while red lines show estimates corrected for low sequencing depth (option ‘-g 3’ to be used for any sequencing depth). Fourth row: Estimates from sequencing data with 5% base calling errors. Green lines represent unadjusted estimates not corrected for errors (program default option), while red lines represent estimates corrected for the true 5% error rate (option ‘-b 0.05’). Simulation results are the geometric mean of 20 replicates. The shaded areas around the estimate lines represent the 95% interval of the distribution of the estimates. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Historical Ne estimates for different species using GONE2.
Estimates of Ne trajectories over the last 50 generations for the same samples analyzed in Table 1. Each panel corresponds to a different species, with estimates obtained using GONE2 under two assumptions: the default option (green lines), which assumes panmixia, and the ‘-x’ option (red lines), which takes population structure into account. The salmon panel combines the analyses of two tributaries of the river Dee. Seabream panel combines the analysis from two parallel breeding programmes. GONE2 did not produce results for cattle for more than five generations, probably due to the complexity of the mixing of 134 highly differentiated breeds. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Estimates of the historical Ne using GONE2 with samples from 17 males of Drosophila melanogaster.
The experimental populations were maintained at constant size (A), underwent a recent drastic reduction in size (B), recovered from a previous bottleneck (C), and underwent further expansion after a bottleneck (D), as shown by the grey shaded areas. The green lines show the estimates of Ne from autosomes using the modifier ‘-g 0’ for unphased diploid genotypes. The red lines show the estimates of Ne from X chromosomes using the haploid option ‘-g 1’. The Ne estimates from X chromosomes have been transformed to the autosomal scale (see Methods) to compare sex ratios. Source data are provided as a Source Data file.

Similar articles

References

    1. Wright, S. Evolution in Mendelian populations. Genetics16, 97–159 (1931). - PMC - PubMed
    1. Frankham, R., Ballou, J. D. & Briscoe, D. A. Introduction to Conservation Genetics 2nd ed (Cambridge University Press, Cambridge, U.K., 2010).
    1. Caballero, A. Quantitative Genetics (Cambridge University Press, 2020).
    1. Waples, R. S. A generalized approach for estimating effective population size from temporal changes in allele frequency. Genetics121, 379–391 (1989). - PMC - PubMed
    1. Tallmon, D. A., Koyuk, A., Luikart, G. & Beaumont, M. A. ONeSAMP: a program to estimate effective population size using approximate Bayesian computation. Mol. Ecol. Resour.8, 299–301 (2008). - PubMed

LinkOut - more resources