Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Oct 2;95(4):421-36.
doi: 10.1016/j.ajhg.2014.09.006.

Characteristics of neutral and deleterious protein-coding variation among individuals and populations

Affiliations

Characteristics of neutral and deleterious protein-coding variation among individuals and populations

Wenqing Fu et al. Am J Hum Genet. .

Abstract

Whole-genome and exome data sets continue to be produced at a frenetic pace, resulting in massively large catalogs of human genomic variation. However, a clear picture of the characteristics and patterns of neutral and deleterious variation within and between populations has yet to emerge, given that recent large-scale sequencing studies have often emphasized different aspects of the data and sometimes appear to have conflicting conclusions. Here, we comprehensively studied characteristics of protein-coding variation in high-coverage exome sequence data from 6,515 European American (EA) and African American (AA) individuals. We developed an unbiased approach to identify putatively deleterious variants and investigated patterns of neutral and deleterious single-nucleotide variants and alleles between individuals and populations. We show that there are substantial differences in the composition of genotypes between EA and AA populations and that small but statistically significant differences exist in the average number of deleterious alleles carried by EA and AA individuals. Furthermore, we performed extensive simulations to delineate the temporal dynamics of deleterious alleles for a broad range of demographic models and use these data to inform the interpretation of empirical patterns of deleterious variation. Finally, we illustrate that the effects of demographic perturbations, such as bottlenecks and expansions, often manifest in opposing patterns of neutral and deleterious variation depending on whether the focus is on populations or individuals. Our results clarify seemingly disparate empirical characteristics of protein-coding variation and provide substantial insights into how natural selection and demographic history have patterned neutral and deleterious variation within and between populations.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Patterns of Protein-Coding SNVs and Alleles among Populations and Individuals (A) Violin plot of the number of SNVs per individual in EA and AA populations. Number of SNVs can be decomposed into whether individuals are heterozygous or homozygous for the derived allele. (B) Modified SFS (see text) in EA and AA populations. The mean DAF (dashed line) in EA and AA populations is nearly identical. (C) Violin plot of the number of derived alleles per individual in EA and AA populations. Note that the average number of derived alleles per individual is nearly identical in EA and AA populations.
Figure 2
Figure 2
Identifying and Correcting Reference Bias of PhyloP (A) Distribution of PhyloP scores calculated from sequence data that we simulated by conditioning on the branch lengths and topology of the 36-way eutherian-mammal phylogeny. PhyloPH (left) and PhyloPNH (right) were calculated on alignments including and excluding the human sequence, respectively. In each plot, the distribution of conservation scores is shown for ancestral and derived sites from the human reference sequence. Note that removing the simulated human sequence before calculating PhyloP mitigates the strong reference-bias effect. (B) Proportion of deleterious variants in the observed data for reference ancestral and derived sites when conservation scores are calculated on alignments including (PhyloPH, left) or excluding (PhyloPNH, right) the human sequence as a function of DAF. Error bars indicate approximate 95% confidence intervals.
Figure 3
Figure 3
Empirical Patterns of Deleterious Protein-Coding Variants Carried by Individuals (A) The average number of deleterious SNVs, heterozygous genotypes, homozygous genotypes, and derived alleles per individual. The average number of deleterious alleles per individual is small but significantly different between EA and AA individuals. (B) The average number of deleterious alleles per individual in EA and AA samples as a function of population DAF. The inset bar plots compare the odds that a derived allele is deleterious to the odds that a derived allele is neutral per individual for variants with a DAF ≤ 0.05% (left) and ≥ 99.9% (right) in EA and AA individuals, respectively. Error bars denote the 95% confidence interval of the odds ratio. Note that on average, EA individuals carry significantly more nearly fixed or fixed deleterious alleles than do AA individuals (p = 8.63 × 10−16).
Figure 4
Figure 4
Temporal Decomposition of Neutral and Deleterious Variation among Present-Day Individuals in Bottleneck Models (A) Diagram of the simulated bottleneck model, in which population size decreased to 0.1× or 0.01× 50 ka ago and then recovered to prebottleneck levels with a size of 10,000 25 ka ago. Bar plots on the far right show the average number of derived alleles per individual per kilobase in present-day individuals as a function of DAF (fixed DAF = 1, common DAF ≥ 0.05, low-frequency 0.01 ≤ DAF < 0.05, and rare DAF < 0.01) and selection coefficients. We decomposed variant density in present-day individuals according to when the mutation arose (before, during, or after the bottleneck), as shown in the bar plots under the demographic model. Thus, we can obtain variant density of present-day individuals for each DAF category (rows) by simply adding variant density across the three time epochs considered. (B) The proportion of derived deleterious alleles (|s| ≥ 10−4) per individual in present-day individuals and as a function of when the mutation arose.
Figure 5
Figure 5
Temporal Decomposition of Neutral and Deleterious Variation among Present-Day Individuals in Recent-Growth Models (A) Diagram of simulated models of recent accelerated growth. Here, populations started growing at a rate of 2% or 3% per generation from a constant population size of 10,000 individuals 5 ka ago. As in Figure 4, the far right column shows the mean number of derived alleles per individual per kilobase in present-day individuals as a function of DAF (fixed DAF = 1, common DAF ≥ 0.05, low-frequency 0.01 ≤ DAF < 0.05, and rare DAF < 0.01) and selection coefficients. Similarly, we decomposed variant density in present-day individuals according to when the mutation arose (before or during growth). (B) The proportion of deleterious alleles (|s| ≥ 10−4) per individual in present-day individuals and as a function of when the mutation arose.
Figure 6
Figure 6
Temporal Decomposition of Neutral and Deleterious Variation among Present-Day Individuals in a More Realistic Demographic Model (A) Diagram of a more realistic demographic model for EA and AA populations. This model involves multiple bottlenecks in the EA lineage, recent accelerated growth, and admixture as inferred by Tennessen et al. As in Figure 4, the far right column shows the mean number of derived alleles per individual per kilobase in present-day individuals as a function of DAF (fixed DAF = 1, common DAF ≥ 0.05, low-frequency 0.01 ≤ DAF < 0.05, and rare DAF < 0.01) and selection coefficients. Similarly, we decomposed variant density in present-day individuals according to when the mutation arose (before the Out-of-Africa bottleneck, during the bottleneck, during the initial period of growth in EA individuals, or during the recent accelerated growth in both EA and AA individuals). (B) The proportion of deleterious alleles (|s| ≥ 10−4) per individual in present-day individuals and as a function of when the mutation arose.
Figure 7
Figure 7
Opposing Patterns of Deleterious SNVs and Alleles in Individuals and Populations (A) The number and proportion of deleterious variants per kilobase in populations as a function of mutation age and DAF. (B) The mean number and proportion of derived deleterious alleles per individual per kilobase as a function of mutation age and DAF. (C) The mutation load as a function of mutation age and DAF. Note that the Tennessen et al. model refers to the demographic model shown in Figure 6. Variants are colored on the basis of whether they are fixed or segregating (“seg”) and when they arose (“age”).

Comment in

References

    1. Haldane J.B.S. The effect of variation on fitness. Am. Nat. 1937;71:337–349.
    1. Crow J.F. Genetic Loads and the Cost of Natural Selection. In: Kojima K.-i., editor. Mathematical Topics in Population Genetics. Springer-Verlag; New York: 1970. pp. 128–177.
    1. Crow J.F. The origins, patterns and implications of human spontaneous mutation. Nat. Rev. Genet. 2000;1:40–47. - PubMed
    1. Gavrilov L.A., Gavrilova N.S., Kroutko V.N., Evdokushkina G.N., Semyonova V.G., Gavrilova A.L., Lapshin E.V., Evdokushkina N.N., Kushnareva Y.E. Mutation load and human longevity. Mutat. Res. 1997;377:61–62. - PubMed
    1. Lynch M. Rate, molecular spectrum, and consequences of human mutation. Proc. Natl. Acad. Sci. USA. 2010;107:961–968. - PMC - PubMed

LinkOut - more resources