Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug 19:10:e67509.
doi: 10.7554/eLife.67509.

Quantifying the relationship between genetic diversity and population size suggests natural selection cannot explain Lewontin's Paradox

Affiliations

Quantifying the relationship between genetic diversity and population size suggests natural selection cannot explain Lewontin's Paradox

Vince Buffalo. Elife. .

Abstract

Neutral theory predicts that genetic diversity increases with population size, yet observed levels of diversity across metazoans vary only two orders of magnitude while population sizes vary over several. This unexpectedly narrow range of diversity is known as Lewontin's Paradox of Variation (1974). While some have suggested selection constrains diversity, tests of this hypothesis seem to fall short. Here, I revisit Lewontin's Paradox to assess whether current models of linked selection are capable of reducing diversity to this extent. To quantify the discrepancy between pairwise diversity and census population sizes across species, I combine previously-published estimates of pairwise diversity from 172 metazoan taxa with newly derived estimates of census sizes. Using phylogenetic comparative methods, I show this relationship is significant accounting for phylogeny, but with high phylogenetic signal and evidence that some lineages experience shifts in the evolutionary rate of diversity deep in the past. Additionally, I find a negative relationship between recombination map length and census size, suggesting abundant species have less recombination and experience greater reductions in diversity due to linked selection. However, I show that even assuming strong and abundant selection, models of linked selection are unlikely to explain the observed relationship between diversity and census sizes across species.

Keywords: Lewontin's Paradox; evolutionary biology; linked selection; none; phylogenetic comparative methods.

PubMed Disclaimer

Conflict of interest statement

VB No competing interests declared

Figures

Figure 1.
Figure 1.. The distribution of approximate census population sizes estimated by this study.
Some phyla containing few species were excluded for clarity.
Figure 1—figure supplement 1.
Figure 1—figure supplement 1.. The relationship between body mass and population density found by Damuth, 1987, which is used to predict population densities.
The source of this data is appendix table of Damuth, 1987; the color indicates Damuth’s original group labels. The dashed line was estimated using a lognormal regression model in Stan. References to each measurement are available in Damuth, 1987.
Figure 1—figure supplement 2.
Figure 1—figure supplement 2.. The fraction of total species per class on earth included in this study’s sample, per class.
The color of the points represents phylum, and the size of the point represents the absolute number of species by class.
Figure 1—figure supplement 3.
Figure 1—figure supplement 3.. Comparison of this paper’s range estimates procedure against the IUCN Red List’s range estimates.
The correspondence between the ranges estimated with the alpha hull method applied to GBIF data used in this paper and IUCN Red List’s Extent of Occurrence for the subset of species in both datasets. Note that the IUCN Red List contains predominantly endangered species, which leads to ascertainment bias; still, the high correlation between the estimated ranges shows the alpha hull method works well.
Figure 1—figure supplement 4.
Figure 1—figure supplement 4.. Validation of this paper’s range estimates against the categorical labels of Leffler et al., 2012.
The estimated ranges using GBIF occurrence data, ordered within and colored by the original range category labels assigned in Leffler et al., 2012.
Figure 1—figure supplement 5.
Figure 1—figure supplement 5.. The relationship between body length (meters) and body mass (grams) in the Romiguier et al., 2014 data set.
The relationship between body length (meters) and body mass (grams) in the Romiguier et al., 2014 data set. This is used to infer body masses for taxa. The gray dashed line is the line of best fit inferred using Stan.
Figure 2.
Figure 2.. A visualization of Lewontin’s Paradox of Variation.
Pairwise diversity (data from Leffler et al., 2012, Corbett-Detig et al., 2015, and Romiguier et al., 2014), which varies over three orders of magnitude, shows a weak relationship with approximate population size, which varies over 12 orders of magnitude. The shaded curve shows the range of expected neutral diversity if Ne were to equal Nc under the four-alleles model, log10(π)=log10(θ)log10(1+4θ/3) where θ=4Ncμ, for two mutation rates, μ=10-8 and μ=10-9, and the light gray dashed line represents the maximum pairwise diversity under the four alleles model. The dark gray dashed line is the OLS regression fit, and the blue dashed line is the regression fit using a phylogenetic mixed-effects model. Points are colored by phylum. The species Equus ferus przewalskii (Nc103 and π=3.6×10-3) was an outlier and excluded from this figure for visual clarity.
Figure 2—figure supplement 1.
Figure 2—figure supplement 1.. A linear-log version of Figure 2.
Points are colored by phylum, and the shaded region is the predicted neutral level of diversity assuming Ne=Nc with mutation range ranging between 109μ108.
Figure 2—figure supplement 2.
Figure 2—figure supplement 2.. A version of Figure 2 with OLS estimates per phylum.
Diversity and approximate population size for 172 taxa, colored by phylum; the dashed lines indicate the non-phylogenetic OLS estimates of the relationship between population size and diversity grouped by phyla.
Figure 2—figure supplement 3.
Figure 2—figure supplement 3.. The posterior distributions and fitted relationship between diversity and both body mass and range size.
The relationship between diversity (differences per basepair) and body mass (left) and range (right) across 172 species. The top row are posterior distributions of parameters estimated using the phylogenetic mixed-effects model using 166 taxa in the synthetic phylogeny for the intercept, slope, and phylogenetic signal from the mixed-effects model. The bottom row contain each species as a point, colored by phyla. The gray dashed line is the non-phylogenetic standard regression estimate, and the blue dashed line is the relationship fit by the phylogenetic mixed-effects model.
Figure 2—figure supplement 4.
Figure 2—figure supplement 4.. Pairwise diversity grouped by the range categories from Leffler et al., 2012, with point size indicating the predicted population density.
The vertical lines are the range category group means.
Figure 3.
Figure 3.. Phylogenetic comparative models of diversity and population size.
(A) The ancestral continuous trait estimates for the population size and diversity (differences per bp, log scaled) across the phylogeny of 166 taxa. The phyla of the tips are indicated by the color bar in the center. (B) The posterior distributions of the intercept, slope, and phylogenetic signal (λ, de Villemereuil and Nakagawa, 2014) of the phylogenetic mixed-effects model of diversity and population size (log scaled). Also shown are the 90% credible interval (light blue shading), posterior mean (blue line), OLS estimate (gray solid line), and bootstrap OLS confidence intervals (light gray shading). (C) The node-height tests of diversity, population size, and the two components of the population size estimates, body mass, and range (all traits on log scale before contrast was calculated). Each point shows the standardized phylogenetic independent contrast and branching time for a pair of lineages. Red lines are robust regression estimates (and are only shown for statistically significant relationships at the α=0.05 level). Note that some outlier pairs with very high phylogenetic independent contrasts were excluded (in all cases, these outliers were in the genus Drosophila).
Figure 3—figure supplement 1.
Figure 3—figure supplement 1.. The posterior distributions for the parameters of the phylogenetic mixed-effects model of diversity and population size (this is analogous to Figure 3B) fit separately on chordates (n=68), molluscs (n=13), and arthropods (n=68).
The phylogenetic mixed-effects model for chordates indicated the best-fitting model had no residual variance (σr2=0), so an alternate model without this variance component was used to ensure proper convergence; this model is shown in green. The light blue (green) shaded regions are the 90% credible intervals, the blue (green) lines the posterior averages, the gray shaded regions the OLS bootstrap 95% confidence intervals, and the gray lines the OLS estimate. Note that unlike Figure 3, the OLS estimate uses all taxa, not just those present in the phylogeny, since splitting the data by phyla reduces sample sizes (OLS with just the subset of taxa in the phylogeny is not significant for either chordates and arthropods). The vertical dashed gray line indicates zero.
Figure 3—figure supplement 2.
Figure 3—figure supplement 2.. The ancestral continuous trait estimates for diversity and population size with species labels.
Figure 3—figure supplement 3.
Figure 3—figure supplement 3.. The ancestral continuous trait estimates for recombination map length and diversity and population size with species labels.
Figure 4.
Figure 4.. Predicting the impact of linked selection on diversity.
(A) The observed relationship between recombination map length (L) and census size (Nc) across 136 species with complete data and known phylogeny. Triangle points indicate six social taxa excluded from the model fitting since these have adaptively higher recombination map lengths (Wilfert et al., 2007). The dark gray line is the estimated relationship under a phylogenetic mixed-effects model, and the gray interval is the 95% posterior average. (B) Points indicate the observed π–Nc relationship across taxa shown in Figure 2, and the blue ribbon is the range of predicted diversity were Ne=Nc for μ=10-810-9, and after accounting for the expected reduction in diversity due to background selection and recurrent hitchhiking under Drosophila melanogaster parameters. In both plots, point color indicates phylum.
Figure 4—figure supplement 1.
Figure 4—figure supplement 1.. The relationship between genome size and approximate census population size.
The dashed gray line indicates the OLS fit. Tiger salamander (Ambystoma tigrinum) was excluded because of its exceptionally large genome size ( 30Gbp).
Figure 4—figure supplement 2.
Figure 4—figure supplement 2.. The relationship between genome size and recombination map length.
The dashed gray line indicates the OLS fit for all taxa, and the dashed colored dashed lines indicate the linear relationship fit by phyla. Tiger salamander (Ambystoma tigrinum) was excluded because of its exceptionally large genome size ( 30Gbp).
Figure 4—figure supplement 3.
Figure 4—figure supplement 3.. The observed π–Nc relationship (points) across species compared to the predicted diversity (ribbons) under different modes of linked selection and parameters, for a range of mutation rates, 10–9–10–8.
In both subplots, the gray ribbon is the expected diversity if Ne=Nc. In (A), the predicted impact on diversity for four modes of linked selection are depicted: background selection (purple) and hitchhiking (yellow) individually under the Drosophila melanogaster parameters as in Figure 4B, and strong background selection (red) where UstrongBGS=10UDmel16, and strong recurrent hitchhiking, where γstrongHH=10γDmel0.23. (B) The predicted diversity under the combined effects of strong background selection and strong hitchhiking (orange) compared to the original predicted diversity as in Figure 4B (blue). Overall, under strong background selection and hitchhiking parameters, predicted diversity would be less than observed for high-Nc species, indicating the poor fit to observed data is not sensitive to the choice of Drosophila melanogaster parameters.
Figure 4—figure supplement 4.
Figure 4—figure supplement 4.. The relationship between Nc and diversity in the Corbett-Detig et al., 2015 data, and the relationship between estimated reduction in diversity and census size, for three different approaches.
(A) The diversity data from Corbett-Detig et al., 2015 and the census population size estimated here for metazoan taxa. (B) The reductions in diversity, R=Ne/N, plotted against census size across species. The red points are the reductions estimated by Corbett-Detig et al., 2015. This confirms Corbett-Detig et al., 2015 finding that the impact of selection (I=1-R) increases with census population size (though, in the original paper size body size and range were used as separate proxy variables for census population size). The green and red points are the predicted reduction in diversity under the recurrent hitchhiking (RHH) and background selection (BGS) model using the Drosophila melanogaster parameters as described in the main text. The reduction in the diversity due to sweeps, from Equation 1, is determined by the term 2NS. Green points treat N as the implied effective population size from diversity N~e=π^/4μ, assuming μ=10-9. Yellow points treat N as the census size, N=Nc. Overall, using the census size, e.g. 2NcS, leads to reductions in diversity that far exceed the empirical estimates of Corbett-Detig et al. and reasonable model-based predictions from N~e.
Figure 4—figure supplement 5.
Figure 4—figure supplement 5.. Comparison of the Drosophila sweep parameters used in this study with parameters from other studies.
(A) The estimate of the number of sweeps per basepair, per genome (νBP) from Table 2 of Elyashiv et al., 2016 (the studies included are Li and Stephan, 2006; Andolfatto, 2007; Macpherson et al., 2007 and Jensen et al., 2008); the red point is my estimate used in this paper. (B) Points are the data from Shapiro et al., 2007. The blue line is the non-linear least squares fit to the data, and the green dashed line is the sweep model parameterized by the genome-wide average sweep coalescence rate 2NS0.92 from the classic sweep and background selection model of Elyashiv et al., 2016 (rs in Supplementary Table S6).
Appendix 4—figure 1.
Appendix 4—figure 1.. A version of Figure 2 with points colored by their IUCN Red List conservation status.
Margin boxplots show the diversity and population size ranges (thin lines) and interquartile ranges (thick lines) for each category. NA/DD indicates no IUCN Red List entry, or Red List status Data Deficient; LC is Least Concern, NT is Near Threatened, VU is Vulnerable, EN is Endangered, and CR is Critically Endangered.

References

    1. Aguade M, Miyashita N, Langley CH. Reduced variation in the yellow-achaete-scute region in natural populations of Drosophila melanogaster. Genetics. 1989;122:607–615. doi: 10.1093/genetics/122.3.607. - DOI - PMC - PubMed
    1. Andolfatto P. Hitchhiking effects of recurrent beneficial amino acid substitutions in the Drosophila melanogaster genome. Genome Research. 2007;17:1755–1762. doi: 10.1101/gr.6691007. - DOI - PMC - PubMed
    1. Bar-On YM, Phillips R, Milo R. The biomass distribution on earth. PNAS. 2018;115:6506–6511. doi: 10.1073/pnas.1711842115. - DOI - PMC - PubMed
    1. Barry P, Broquet T, Gagnaire P-A. Life tables shape genetic diversity in marine fishes. bioRxiv. 2020 doi: 10.1101/2020.12.18.423459. - DOI - PMC - PubMed
    1. Barton NH. Linkage and the limits to natural selection. Genetics. 1995;140:821–841. doi: 10.1093/genetics/140.2.821. - DOI - PMC - PubMed

Publication types