Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015 Jun;16(6):333-43.
doi: 10.1038/nrg3931. Epub 2015 May 12.

Estimating the mutation load in human genomes

Affiliations
Review

Estimating the mutation load in human genomes

Brenna M Henn et al. Nat Rev Genet. 2015 Jun.

Abstract

Next-generation sequencing technology has facilitated the discovery of millions of genetic variants in human genomes. A sizeable fraction of these variants are predicted to be deleterious. Here, we review the pattern of deleterious alleles as ascertained in genome sequencing data sets and ask whether human populations differ in their predicted burden of deleterious alleles - a phenomenon known as mutation load. We discuss three demographic models that are predicted to affect mutation load and relate these models to the evidence (or the lack thereof) for variation in the efficacy of purifying selection in diverse human genomes. We also emphasize why accurate estimation of mutation load depends on assumptions regarding the distribution of dominance and selection coefficients - quantities that remain poorly characterized for current genomic data sets.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Proportion of deleterious variants found in an individual’s genome classified by their frequency in the population (common vs. rare)
We wanted to ascertain whether the burden of the deleterious portion of an individual’s genome is mostly represented by rare or by common variants. For the 1000 Genomes Yoruba (YRI) population, variants were assigned to three selection regimes (moderate, large, extreme), according to GERP score categories in increasing order of phylogenetic conservation 2:4, 4:6, >6. The more conserved a site is, the more likely a new allele is to be deleterious (Box 2). Deleterious variants with a derived allele frequency lower than 5% within the population (purple) are classified as “rare”, and the rest as “common” (blue). Almost 70% of the deleterious variants found in an individual genome are common, and most of them have small predicted effect (“moderate”). Half of the rare variants also have a moderate effect, and half of them have a large effect, demonstrating how low frequency, large effect variants have not yet been purged by purifying selection.
Figure 2
Figure 2. Differences in the site frequency spectrum across populations for neutral and deleterious variants
The site frequency spectrum (SFS) can be a powerful method of summarizing genomic data. We show the SFS for four populations focusing on both low frequency variants (<15%, left panel) and nearly fixed variants (>90%, right panel). Using 1000 Genomes Phase 1 exome data, we sampled 42 individuals from the Yoruba (YRI, Nigeria), Mexicans (MXL, Mexico), Tuscans (TSI, Italy) and Japanese (JPT, Japan) populations. [Only individuals on the same Agilent exome platform were compared here to avoid biases in target capture between platforms.] Derived variants were annotated with GERP (Box S1) and we plot variants predicted to have a “large” deleterious effect (GERP>4) (top panels) and “neutral” effect (GERP<2) (bottom panels). Demography generates different SFS for each population. Neutral variants provide a null demographic model. The African Yoruba have the greatest number of rare deleterious variants, though the Japanese and Tuscans have many more deleterious fixed variants, likely due to ancient founder effects resulting in the fixation by strong drift (also noted in). By comparing the difference between the neutral and deleterious SFS (Figure S1), one can infer the impact of purifying selection. For example, non-African populations have a larger proportion of deleterious variants that are fixed, compared to what is seen neutrally.
Figure 3
Figure 3. Demographic history based on the site frequency spectrum and sharing of rare alleles
a) Updated three-population demographic model based on synonymous sites from 1000 Genomes Phase 1 data, assuming a mutation rate of 2.36×10−8/bp/g and a generation time of 25 years (for ease of comparison with Gravel et al. and Tennessen et al.). Estimated times and population sizes are inversely proportional to the assumed mutation rate.
Figure 4
Figure 4. Schematic of different demographic models for the Out-of-Africa dispersal
Three demographic models have been discussed in the context of changes in genetic load due to extreme genetic drift across different human populations. All three models allow for a severe Out-of-Africa bottleneck and recovery but with varying degrees of subsequent population size changes. Colored dots indicate allelic diversity; width of the column is proportional the effective population size, Ne. The bottom tube represents the ancestral African population size, with later events occurring in temporal sequence towards the top of the figure.
Figure 5
Figure 5. Mutational load under an additive and a recessive model
Using the same dataset as in Figure 2, we computed the total mutation load for each population. GERP scores were annotated in whole-exome data. Variants were grouped in three categories according their GERP score (2:4, 4:6, >6), corresponding to different biological functional effects. The more phylogenetically conserved a site is, the more likely a new allele is to be deleterious and have a high GERP score (Box S1). Within each category, three selection coefficients were assigned: (s= −4.5 × 10−4), (s= −4.5 × 10−3) and (s= −1 × 10−2), using the inferred s coefficients in Boyko et al.. Total mutational load is the sum of load for each locus. Mutational load under an additive model is higher than mutational load under a recessive model because the phenotypic effect of a variant is masked in the recessive homozygous state. While only slight differences exist between populations for an additive model of dominance (~1.5%), strong differences occur under a recessive model because of the differential number of derived homozygotes among populations.
None
Allele sharing versus allele frequency among European populations: the sharing ratio is the probability that two minor alleles drawn randomly in a pooled sample come from different populations, relative to the panmictic expectation. A panmictic population has a ratio of 1, and completely diverged populations have a ratio of 0. GBR: Great Britain; CEU: Central European from Utah; TSI: Tuscan; FIN:Finnish.

References

    1. Ohta T, Gillespie J. Development of Neutral and Nearly Neutral Theories. Theor Popul Biol. 1996;49:128–142. - PubMed
    1. Kimura M, Maruyama T, Crow JF. The Mutation Load In Small Populations. Genetics. 1963;48:1303–1312. - PMC - PubMed
    1. King JL, Jukes TH. Non-Darwinian evolution. Science. 1969;164:788–798. - PubMed
    1. Marth GT, Czabarka E, Murvai J, Sherry ST. The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics. 2004;166:351–372. - PMC - PubMed
    1. Laval G, Patin E, Barreiro LB, Quintana-Murci L. Formulating a Historical and Demographic Model of Recent Human Evolution Based on Resequencing Data from Noncoding Regions. PLoS ONE. 2010;5:e10284. - PMC - PubMed