Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jul 25;9(7):e103357.
doi: 10.1371/journal.pone.0103357. eCollection 2014.

Purifying selection in deeply conserved human enhancers is more consistent than in coding sequences

Affiliations

Purifying selection in deeply conserved human enhancers is more consistent than in coding sequences

Dilrini R De Silva et al. PLoS One. .

Abstract

Comparison of polymorphism at synonymous and non-synonymous sites in protein-coding DNA can provide evidence for selective constraint. Non-coding DNA that forms part of the regulatory landscape presents more of a challenge since there is not such a clear-cut distinction between sites under stronger and weaker selective constraint. Here, we consider putative regulatory elements termed Conserved Non-coding Elements (CNEs) defined by their high level of sequence identity across all vertebrates. Some mutations in these regions have been implicated in developmental disorders; we analyse CNE polymorphism data to investigate whether such deleterious effects are widespread in humans. Single nucleotide variants from the HapMap and 1000 Genomes Projects were mapped across nearly 2000 CNEs. In the 1000 Genomes data we find a significant excess of rare derived alleles in CNEs relative to coding sequences; this pattern is absent in HapMap data, apparently obscured by ascertainment bias. The distribution of polymorphism within CNEs is not uniform; we could identify two categories of sites by exploiting deep vertebrate alignments: stretches that are non-variant, and those that have at least one substitution. The conserved category has fewer polymorphic sites and a greater excess of rare derived alleles, which can be explained by a large proportion of sites under strong purifying selection within humans--higher than that for non-synonymous sites in most protein coding regions, and comparable to that at the strongly conserved trans-dev genes. Conversely, the more evolutionarily labile CNE sites have an allele frequency distribution not significantly different from non-synonymous sites. Future studies should exploit genome-wide re-sequencing to obtain better coverage in selected non-coding regions, given the likelihood that mutations in evolutionarily conserved enhancer sequences are deleterious. Discovery pipelines should validate non-coding variants to aid in identifying causal and risk-enhancing variants in complex disorders, in contrast to the current focus on exome sequencing.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Cumulative derived allele-frequency in CNEs and control regions from 1000 Genomes Project.
An excess of rare derived alleles is observed in CNE-NVRs, CNE-RVRs and Non-synonymous sites relative to Synonymous and Non-coding controls (using polymorphic sites with Derived Allele Count (DAC) > = 6). The spectra at CNEs is comparable to the spectra at highly conserved trans-dev genes associated with CNEs.
Figure 2
Figure 2. Site-frequency spectra in CNE-NVRs and non-synonymous sites.
A) Site-frequency spectra in YRI binned into 3 units. The observed non-synonymous site-frequency spectrum fits a gamma distribution of selective effects (shape  = 0.1, rate  = 6.25). The observed site-frequency spectrum in CNE NVRs fits a gamma distribution of selective effects with a higher mean (shape = 0.18, rate = 7.8). B) The cumulative probability densities of the fitted gamma distributions indicate a larger proportion of lethal sites (>1%) in CNE-NVRs (32%) compared to non-synonymous sites (21%).
Figure 3
Figure 3. Cumulative derived allele-frequency in CNEs and control regions from the HapMap Project.
An excess of rare derived alleles is observed only in Non-synonymous sites relative to Synonymous and Non-coding controls. CNE-NVRs have an excess of rare derived alleles compared to CNE-RVRs however, the derived allele-frequency spectra in CNEs resemble that at synonymous sites as a result of ascertainment bias in the HapMap dataset.

Similar articles

Cited by

References

    1. Todorova A, Halliger-Keller B, Walter MC, Dabauvalle MC, Lochm¸ller H, et al. (2003) A synonymous codon change in the LMNA gene alters mRNA splicing and causes limb girdle muscular dystrophy type 1B. J Med Genet 40: e115. - PMC - PubMed
    1. Cai JJ, Macpherson JM, Sella G, Petrov DA (2009) Pervasive hitchhiking at coding and regulatory sites in humans. PLoS Genet 5: e1000336. - PMC - PubMed
    1. Singleton AB (2011) Exome sequencing: a transformative technology. Lancet Neurol 10: 942–946. - PMC - PubMed
    1. Epstein DJ (2009) Cis-regulatory mutations in human disease. Brief Funct Genomic Proteomic 8: 310–316. - PMC - PubMed
    1. Lettice LA, Heaney SJ, Purdie LA, Li L, de Beer P, et al. (2003) A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum Mol Genet 12: 1725–1735. - PubMed

Publication types

LinkOut - more resources