Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 May 29;16(5):e1008827.
doi: 10.1371/journal.pgen.1008827. eCollection 2020 May.

Population genetic models of GERP scores suggest pervasive turnover of constrained sites across mammalian evolution

Affiliations

Population genetic models of GERP scores suggest pervasive turnover of constrained sites across mammalian evolution

Christian D Huber et al. PLoS Genet. .

Abstract

Comparative genomic approaches have been used to identify sites where mutations are under purifying selection and of functional consequence by searching for sequences that are conserved across distantly related species. However, the performance of these approaches has not been rigorously evaluated under population genetic models. Further, short-lived functional elements may not leave a footprint of sequence conservation across many species. We use simulations to study how one measure of conservation, the Genomic Evolutionary Rate Profiling (GERP) score, relates to the strength of selection (Nes). We show that the GERP score is related to the strength of purifying selection. However, changes in selection coefficients or functional elements over time (i.e. functional turnover) can strongly affect the GERP distribution, leading to unexpected relationships between GERP and Nes. Further, we show that for functional elements that have a high turnover rate, adding more species to the analysis does not necessarily increase statistical power. Finally, we use the distribution of GERP scores across the human genome to compare models with and without turnover of sites where mutations are under purifying selection. We show that mutations in 4.51% of the noncoding human genome are under purifying selection and that most of this sequence has likely experienced changes in selection coefficients throughout mammalian evolution. Our work reveals limitations to using comparative genomic approaches to identify deleterious mutations. Commonly used GERP score thresholds miss over half of the noncoding sites in the human genome where mutations are under purifying selection.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. GERP scores as a function of the strength of purifying selection (Nes).
(A) Violin plots of simulated GERP scores on a 36 species phylogeny assuming Nes values from 0 to -8 in steps of 0.5. (B) Power of GERP to detect purifying selection, at a given selection strength shown on the x-axis. GERP scores were computed using the entire phylogeny of 36 mammalian species, but purifying selection only occurred in the phylogenetic scope shown in the legend.
Fig 2
Fig 2. GERP scores and Nes values under a codon-based model of evolution.
(A) Nes values of nonsynonymous mutations as a function of GERP scores for different degrees of codon degeneracy. Codon degeneracy and Nes value are observed in humans, whereas simulations are run across the entire 36 species tree. Note that a nonfunctional site in humans is considered to have a Nes value of zero, whereas a 4-fold degenerate site can have Nes values different from zero. The blue line represents the median Nes value given a specific GERP score, whereas the dashed lines represent the 2.5% and 97.5% quantiles. (B) Distribution of Nes values for GERP scores at 0-fold and 2-fold sites. Note that the Nes values are distributed differently for the same GERP scores depending on the type of site.
Fig 3
Fig 3. Turnover of selected sequence disrupts the relationship between GERP scores and Nes values.
(A) Nes values as a function of GERP scores for a model without turnover of functional sequence across the 36 species tree (left) or where there is turnover modelled according to our Markov model (right). The turnover rate is estimated in Rands et al. [25] for noncoding elements. Green dots denote selected sites and yellow dots denote neutral sites, as observed in humans. The blue line represents the median Nes value given a specific GERP score, whereas the dashed lines represent the 2.5% and 97.5% quantiles. (B) Distribution of Nes values for GERP scores when there is no turnover (left) and when there is turnover of functional sequence (right). Note that when there is turnover, the majority of the sites with high GERP scores (>5.5) are not functional.
Fig 4
Fig 4. Power to detect purifying selection using GERP scores as a function of tree size.
Colored lines denote different strengths of purifying selection. Tree size is defined as the sum of lengths of all branches of the tree. Branch length is measured as expected neutral substitutions, i.e. a branch with length one has on average one neutral substitution. The tree size is varied by including/excluding species from a phylogenetic tree of 100 vertebrates (see main text). Left panel shows no turnover. Right panel shows intergenic levels of turnover with turnover rate as estimated in Rands et al. [25] for noncoding elements. Middle panel shows intermediate turnover with a rate half of that in the right panel. Blue vertical lines denote the tree size of the 36 mammalian species tree that is commonly used for calculation of GERP. See S2 Text for further discussion of alternate strategies for adding species.
Fig 5
Fig 5. Fit of models of purifying selection to the empirical GERP score distribution for different tree depths.
Dashed gray lines indicate the empirical distribution of GERP scores. The 3 plots in each row denote the distributions for different depths in the multi-species sequence alignment. The GERP scores were normalized by dividing each score by the largest possible score given the tree size (see Methods). (A-C) Fit of a model with 3 categories of sites: neutral, selected, and turnover (see text). (D-F) Fit of a model with 2 categories of sites: neutral and selected. Note that the model with turnover provides a more satisfactory fit to the empirical data.
Fig 6
Fig 6. Amount of the noncoding human genome under purifying selection for different models.
Left panels show the proportion of sites falling in each category of the mixture component as a function of tree size. Right panels show the proportion of the genome falling in the selected categories under the model. (A) The full model including sequence turnover (N+C+TO) that better fits the data. (B) A model without sequence turnover (N+C).
Fig 7
Fig 7. The observed values of Λ fall outside the null distributions.
The null distribution (blue points) is derived from 500 simulations under the respective null model, i.e. assuming only neutral sites in (A) and neutral plus constantly selected sites in (B). The triangles denote the empirically observed statistics. In all cases, the null hypothesis is rejected with p<0.01.

Similar articles

Cited by

References

    1. Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am J Hum Genet. 2017;101: 5–22. 10.1016/j.ajhg.2017.06.005 - DOI - PMC - PubMed
    1. Edwards SL, Beesley J, French JD, Dunning AM. Beyond GWASs: illuminating the dark road from association to function. Am J Hum Genet. 2013;93: 779–797. 10.1016/j.ajhg.2013.10.012 - DOI - PMC - PubMed
    1. Schubert M, Jónsson H, Chang D, Der Sarkissian C, Ermini L, Ginolhac A, et al. Prehistoric genomes reveal the genetic foundation and cost of horse domestication. Proc Natl Acad Sci U S A. 2014;111: E5661–9. 10.1073/pnas.1416991111 - DOI - PMC - PubMed
    1. Marsden CD, Vecchyo DO-D, O’Brien DP, Taylor JF, Ramirez O, Vilà C, et al. Bottlenecks and selective sweeps during domestication have increased deleterious genetic variation in dogs. Proc Natl Acad Sci U S A. 2016;113: 152–157. 10.1073/pnas.1512501113 - DOI - PMC - PubMed
    1. Henn BM, Botigué LR, Peischl S, Dupanloup I, Lipatov M, Maples BK, et al. Distance from sub-Saharan Africa predicts mutational load in diverse human genomes. Proc Natl Acad Sci U S A. 2016;113: E440–E449. 10.1073/pnas.1510805112 - DOI - PMC - PubMed

Publication types