. 2020 May 29;16(5):e1008827.

doi: 10.1371/journal.pgen.1008827. eCollection 2020 May.

Population genetic models of GERP scores suggest pervasive turnover of constrained sites across mammalian evolution

Christian D Huber¹, Bernard Y Kim², Kirk E Lohmueller^{3

4

5}

Affiliations

¹ School of Biological Sciences, University of Adelaide, Adelaide, South Australia, Australia.
² Department of Biology, Stanford University, Stanford, California, United States of America.
³ Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, United States of America.
⁴ Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America.
⁵ Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America.

PMID: 32469868
PMCID: PMC7286533
DOI: 10.1371/journal.pgen.1008827

Population genetic models of GERP scores suggest pervasive turnover of constrained sites across mammalian evolution

Christian D Huber et al. PLoS Genet. 2020.

. 2020 May 29;16(5):e1008827.

doi: 10.1371/journal.pgen.1008827. eCollection 2020 May.

Authors

Christian D Huber¹, Bernard Y Kim², Kirk E Lohmueller^{3

4

5}

Affiliations

¹ School of Biological Sciences, University of Adelaide, Adelaide, South Australia, Australia.
² Department of Biology, Stanford University, Stanford, California, United States of America.
³ Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, United States of America.
⁴ Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America.
⁵ Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America.

PMID: 32469868
PMCID: PMC7286533
DOI: 10.1371/journal.pgen.1008827

Abstract

Comparative genomic approaches have been used to identify sites where mutations are under purifying selection and of functional consequence by searching for sequences that are conserved across distantly related species. However, the performance of these approaches has not been rigorously evaluated under population genetic models. Further, short-lived functional elements may not leave a footprint of sequence conservation across many species. We use simulations to study how one measure of conservation, the Genomic Evolutionary Rate Profiling (GERP) score, relates to the strength of selection (Nes). We show that the GERP score is related to the strength of purifying selection. However, changes in selection coefficients or functional elements over time (i.e. functional turnover) can strongly affect the GERP distribution, leading to unexpected relationships between GERP and Nes. Further, we show that for functional elements that have a high turnover rate, adding more species to the analysis does not necessarily increase statistical power. Finally, we use the distribution of GERP scores across the human genome to compare models with and without turnover of sites where mutations are under purifying selection. We show that mutations in 4.51% of the noncoding human genome are under purifying selection and that most of this sequence has likely experienced changes in selection coefficients throughout mammalian evolution. Our work reveals limitations to using comparative genomic approaches to identify deleterious mutations. Commonly used GERP score thresholds miss over half of the noncoding sites in the human genome where mutations are under purifying selection.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. GERP scores as a function of the strength of purifying selection (N_es).**
(A) Violin plots of simulated GERP scores on a 36 species phylogeny assuming N_es values from 0 to -8 in steps of 0.5. (B) Power of GERP to detect purifying selection, at a given selection strength shown on the x-axis. GERP scores were computed using the entire phylogeny of 36 mammalian species, but purifying selection only occurred in the phylogenetic scope shown in the legend.

**Fig 2. GERP scores and N_es values under a codon-based model of evolution.**
(A) N_es values of nonsynonymous mutations as a function of GERP scores for different degrees of codon degeneracy. Codon degeneracy and N_es value are observed in humans, whereas simulations are run across the entire 36 species tree. Note that a nonfunctional site in humans is considered to have a N_es value of zero, whereas a 4-fold degenerate site can have N_es values different from zero. The blue line represents the median N_es value given a specific GERP score, whereas the dashed lines represent the 2.5% and 97.5% quantiles. (B) Distribution of N_es values for GERP scores at 0-fold and 2-fold sites. Note that the N_es values are distributed differently for the same GERP scores depending on the type of site.

**Fig 3. Turnover of selected sequence disrupts the relationship between GERP scores and N_es values.**
(A) N_es values as a function of GERP scores for a model without turnover of functional sequence across the 36 species tree (left) or where there is turnover modelled according to our Markov model (right). The turnover rate is estimated in Rands et al. [25] for noncoding elements. Green dots denote selected sites and yellow dots denote neutral sites, as observed in humans. The blue line represents the median N_es value given a specific GERP score, whereas the dashed lines represent the 2.5% and 97.5% quantiles. (B) Distribution of N_es values for GERP scores when there is no turnover (left) and when there is turnover of functional sequence (right). Note that when there is turnover, the majority of the sites with high GERP scores (>5.5) are not functional.

**Fig 4. Power to detect purifying selection using GERP scores as a function of tree size.**
Colored lines denote different strengths of purifying selection. Tree size is defined as the sum of lengths of all branches of the tree. Branch length is measured as expected neutral substitutions, i.e. a branch with length one has on average one neutral substitution. The tree size is varied by including/excluding species from a phylogenetic tree of 100 vertebrates (see main text). Left panel shows no turnover. Right panel shows intergenic levels of turnover with turnover rate as estimated in Rands et al. [25] for noncoding elements. Middle panel shows intermediate turnover with a rate half of that in the right panel. Blue vertical lines denote the tree size of the 36 mammalian species tree that is commonly used for calculation of GERP. See S2 Text for further discussion of alternate strategies for adding species.

**Fig 5. Fit of models of purifying selection to the empirical GERP score distribution for different tree depths.**
Dashed gray lines indicate the empirical distribution of GERP scores. The 3 plots in each row denote the distributions for different depths in the multi-species sequence alignment. The GERP scores were normalized by dividing each score by the largest possible score given the tree size (see Methods). (A-C) Fit of a model with 3 categories of sites: neutral, selected, and turnover (see text). (D-F) Fit of a model with 2 categories of sites: neutral and selected. Note that the model with turnover provides a more satisfactory fit to the empirical data.

**Fig 6. Amount of the noncoding human genome under purifying selection for different models.**
Left panels show the proportion of sites falling in each category of the mixture component as a function of tree size. Right panels show the proportion of the genome falling in the selected categories under the model. (A) The full model including sequence turnover (N+C+TO) that better fits the data. (B) A model without sequence turnover (N+C).

**Fig 7. The observed values of Λ fall outside the null distributions.**
The null distribution (blue points) is derived from 500 simulations under the respective null model, i.e. assuming only neutral sites in (A) and neutral plus constantly selected sites in (B). The triangles denote the empirically observed statistics. In all cases, the null hypothesis is rejected with p<0.01.

See this image and copyright information in PMC

Cited by

Lineage Differentiation and Genomic Vulnerability in a Relict Tree From Subtropical Forests.
Zhu XL, Wang J, Chen HF, Kang M. Zhu XL, et al. Evol Appl. 2024 Nov 1;17(11):e70033. doi: 10.1111/eva.70033. eCollection 2024 Nov. Evol Appl. 2024. PMID: 39494192 Free PMC article.
Genic constraint against nonsynonymous variation across the mouse genome.
Powell G, Simon MM, Pulit S, Mallon AM, Lindgren CM. Powell G, et al. BMC Genomics. 2023 Sep 22;24(1):562. doi: 10.1186/s12864-023-09637-2. BMC Genomics. 2023. PMID: 37736706 Free PMC article.
Exploring TTN variants as genetic insights into cardiomyopathy pathogenesis and potential emerging clues to molecular mechanisms in cardiomyopathies.
Jolfayi AG, Kohansal E, Ghasemi S, Naderi N, Hesami M, MozafaryBazargany M, Moghadam MH, Fazelifar AF, Maleki M, Kalayinia S. Jolfayi AG, et al. Sci Rep. 2024 Mar 4;14(1):5313. doi: 10.1038/s41598-024-56154-7. Sci Rep. 2024. PMID: 38438525 Free PMC article. Review.
Isolation and Characterization of the Adamantinomatous Craniopharyngioma Primary Cells with Cancer-Associated Fibroblast Features.
Chen D, Lei T, Wang Y, Yu Z, Liu S, Ye L, Li W, Yang Q, Jin H, Liu F, Li Y. Chen D, et al. Biomedicines. 2025 Apr 9;13(4):912. doi: 10.3390/biomedicines13040912. Biomedicines. 2025. PMID: 40299526 Free PMC article.
High frequency of an otherwise rare phenotype in a small and isolated tiger population.
Sagar V, Kaelin CB, Natesh M, Reddy PA, Mohapatra RK, Chhattani H, Thatte P, Vaidyanathan S, Biswas S, Bhatt S, Paul S, Jhala YV, Verma MM, Pandav B, Mondol S, Barsh GS, Swain D, Ramakrishnan U. Sagar V, et al. Proc Natl Acad Sci U S A. 2021 Sep 28;118(39):e2025273118. doi: 10.1073/pnas.2025273118. Proc Natl Acad Sci U S A. 2021. PMID: 34518374 Free PMC article.

See all "Cited by" articles

References

1. Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am J Hum Genet. 2017;101: 5–22. 10.1016/j.ajhg.2017.06.005 - DOI - PMC - PubMed
1. Edwards SL, Beesley J, French JD, Dunning AM. Beyond GWASs: illuminating the dark road from association to function. Am J Hum Genet. 2013;93: 779–797. 10.1016/j.ajhg.2013.10.012 - DOI - PMC - PubMed
1. Schubert M, Jónsson H, Chang D, Der Sarkissian C, Ermini L, Ginolhac A, et al. Prehistoric genomes reveal the genetic foundation and cost of horse domestication. Proc Natl Acad Sci U S A. 2014;111: E5661–9. 10.1073/pnas.1416991111 - DOI - PMC - PubMed
1. Marsden CD, Vecchyo DO-D, O’Brien DP, Taylor JF, Ramirez O, Vilà C, et al. Bottlenecks and selective sweeps during domestication have increased deleterious genetic variation in dogs. Proc Natl Acad Sci U S A. 2016;113: 152–157. 10.1073/pnas.1512501113 - DOI - PMC - PubMed
1. Henn BM, Botigué LR, Peischl S, Dupanloup I, Lipatov M, Maples BK, et al. Distance from sub-Saharan Africa predicts mutational load in diverse human genomes. Proc Natl Acad Sci U S A. 2016;113: E440–E449. 10.1073/pnas.1510805112 - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R35 GM119856/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Population genetic models of GERP scores suggest pervasive turnover of constrained sites across mammalian evolution

Affiliations

Population genetic models of GERP scores suggest pervasive turnover of constrained sites across mammalian evolution

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources