Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 May;215(1):193-214.
doi: 10.1534/genetics.120.303143. Epub 2020 Mar 24.

Space is the Place: Effects of Continuous Spatial Structure on Analysis of Population Genetic Data

Affiliations

Space is the Place: Effects of Continuous Spatial Structure on Analysis of Population Genetic Data

C J Battey et al. Genetics. 2020 May.

Abstract

Real geography is continuous, but standard models in population genetics are based on discrete, well-mixed populations. As a result, many methods of analyzing genetic data assume that samples are a random draw from a well-mixed population, but are applied to clustered samples from populations that are structured clinally over space. Here, we use simulations of populations living in continuous geography to study the impacts of dispersal and sampling strategy on population genetic summary statistics, demographic inference, and genome-wide association studies (GWAS). We find that most common summary statistics have distributions that differ substantially from those seen in well-mixed populations, especially when Wright's neighborhood size is < 100 and sampling is spatially clustered. "Stepping-stone" models reproduce some of these effects, but discretizing the landscape introduces artifacts that in some cases are exacerbated at higher resolutions. The combination of low dispersal and clustered sampling causes demographic inference from the site frequency spectrum to infer more turbulent demographic histories, but averaged results across multiple simulations revealed surprisingly little systematic bias. We also show that the combination of spatially autocorrelated environments and limited dispersal causes GWAS to identify spurious signals of genetic association with purely environmentally determined phenotypes, and that this bias is only partially corrected by regressing out principal components of ancestry. Last, we discuss the relevance of our simulation results for inference from genetic variation in real organisms.

Keywords: GWAS; demography; haplotype block sharing; population structure; space.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Example sampling maps for 60 individuals on a 50 × 50 landscape for midpoint, point, and random sampling strategies, respectively.
Figure 2
Figure 2
Genealogical parameters from spatial and random mating SLiM simulations, by neighborhood size.
Figure 3
Figure 3
Run times of continuous space simulations with landscape width 50 and expected density 5 under varying neighborhood size. Times are shown for simulations run with mutations applied directly in SLiM (dashed lines) or later applied to tree sequences with msprime (solid lines). Times for simulations run with tree sequence recording disabled are shown in gray. CPU, central processor unit.
Figure 4
Figure 4
Site frequency spectrum (A) (note axes are log-scaled) and summary statistic distributions (B) by sampling strategy and neighborhood size. corr, correlation; dist, distance; dxy, pairwise genetic distance; IBS, identical-by-state; var, variation. Summary statistics are described in detail in Table S1.
Figure 5
Figure 5
Cumulative distributions for IBS tract lengths per pair of individuals at different geographic distances, across three NSs. Nearby pairs (red curves) share many more long IBS tracts than do distant pairs (blue curves), except in the random mating model. The distributions of long IBS tracts between nearby individuals are very similar across NSs, but distant individuals are much more likely to share long IBS tracts at high NS than at low NS. IBS, identical-by-state; NS, neighborhood size.
Figure 6
Figure 6
Spatial spread of rare alleles by NS. Each plot shows the distribution (across derived alleles and simulations) of average pairwise distance between individuals carrying a focal derived allele and derived allele frequency. NS, neighborhood size.
Figure 7
Figure 7
(A) Rolling median inferred Ne trajectories for stairwayplot and SMC++ across sampling strategies and NS bins. The dotted line shows the mean Ne of random mating simulations. (B) SD of individual inferred Ne trajectories, by NS and sampling strategy. Black lines are loess curves. Plots including individual model fits are shown in Figure S7. NS, neighborhood size.
Figure 8
Figure 8
Impacts of spatially varying environments and isolation by distance on linear regression GWAS. Simulated quantitative phenotypes are determined only by an individual’s location and the spatial distribution of environmental factors. In (A) we show the phenotypes and locations of sampled individuals under four environmental distributions, with transparency scaled to phenotype. As neighborhood size increases a PCA explains less of the total variation in the data (B). Spatially correlated environmental factors cause false positives at a large proportion of SNPs, which is partially but not entirely corrected by adding the first 10 PC coordinates as covariates (C). Quantile–quantile plots in (D) show inflation of log10(p) after PC correction for simulations with spatially structured environments, with line colors showing the neighborhood size of each simulation. FDR, false discovery rate; GWAS, genome-wide association study; PC, principal component; PCA, PC analysis.
Figure A1
Figure A1
Summary statistics for two-dimensional coalescent stepping-stone models with fixed total Ne and varying numbers of demes per side. The black “infinite” points are from our forward-time continuous space model. Interdeme migration rates are related to σ as described above.

References

    1. Aguillon S. M., Fitzpatrick J. W., Bowman R., Schoech S. J., Clark A. G. et al. , 2017. Deconstructing isolation-by-distance: the genomic consequences of limited dispersal. PLoS Genet. 13: e1006911 10.1371/journal.pgen.1006911 - DOI - PMC - PubMed
    1. Al-Asadi H., Petkova D., Stephens M., and Novembre J., 2019. Estimating recent migration and population-size surfaces. PLoS Genet. 15: e1007908 10.1371/journal.pgen.1007908 - DOI - PMC - PubMed
    1. Allee W. C., Park O., Emerson A. E., Park T., Schmidt K. P. et al. , 1949. Principles of Animal Ecology. Technical Report. Saunders Company, Philadelphia, PA.
    1. Antlfinger A. E., 1982. Genetic neighborhood structure of the salt marsh composite, Borrichia frutescens. J. Hered. 73: 128–132. 10.1093/oxfordjournals.jhered.a109595 - DOI
    1. Antolin M. F., Horne B. V., Berger M. D. Jr., Holloway A. K., Roach J. L. et al. , 2001. Effective population size and genetic structure of a piute ground squirrel (Spermophilus mollis) population. Can. J. Zool. 79: 26–34. 10.1139/z00-170 - DOI

Publication types