Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Sep;22(9):1689-97.
doi: 10.1101/gr.134890.111.

Personal and population genomics of human regulatory variation

Affiliations

Personal and population genomics of human regulatory variation

Benjamin Vernot et al. Genome Res. 2012 Sep.

Abstract

The characteristics and evolutionary forces acting on regulatory variation in humans remains elusive because of the difficulty in defining functionally important noncoding DNA. Here, we combine genome-scale maps of regulatory DNA marked by DNase I hypersensitive sites (DHSs) from 138 cell and tissue types with whole-genome sequences of 53 geographically diverse individuals in order to better delimit the patterns of regulatory variation in humans. We estimate that individuals likely harbor many more functionally important variants in regulatory DNA compared with protein-coding regions, although they are likely to have, on average, smaller effect sizes. Moreover, we demonstrate that there is significant heterogeneity in the level of functional constraint in regulatory DNA among different cell types. We also find marked variability in functional constraint among transcription factor motifs in regulatory DNA, with sequence motifs for major developmental regulators, such as HOX proteins, exhibiting levels of constraint comparable to protein-coding regions. Finally, we perform a genome-wide scan of recent positive selection and identify hundreds of novel substrates of adaptive regulatory evolution that are enriched for biologically interesting pathways such as melanogenesis and adipocytokine signaling. These data and results provide new insights into patterns of regulatory variation in individuals and populations and demonstrate that a large proportion of functionally important variation lies beyond the exome.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of data used in the analyses. (A) Schematic of the DNase I data. Binding of regulatory proteins to DNA (blue rectangle) results in nucleosome (open circles) displacement and local chromatin remodeling, and these regions are susceptible to cleavage with the endonuclease DNase I. High-throughput sequencing of libraries made from digested nuclei reveals DNase I hypersensitive sites, detectable by increased depth of coverage. Peaks are defined as 150-bp windows centered on the area of maximum cleavage (The ENCODE Project Consortium 2012). Within hypersensitive sites, footprints of regulatory factor binding are observed as decreased cleavage. (B) Unrooted neighbor-joining tree of the 53 unrelated individuals colored by population. Abbreviations are described in Supplemental Table 2.
Figure 2.
Figure 2.
Characteristics of regulatory variation among individuals. (A) Total number of variants in DNase I peaks, footprints, and the exome stratified by GERP score. (B) Distribution of the number of variants per individual in DNase I peaks, footprints, and the exome. (C) Distribution of the number of variants per individual with GERP ≥ 3 in DNase I peaks, footprints, and exomes.
Figure 3.
Figure 3.
Significant variation of diversity between 732 cis-regulatory motifs. (A) For each motif, average diversity is plotted as a black circle, and 95% confidence intervals obtained by bootstrapping are shown as gray lines. The light blue and yellow rectangles denote the 95% confidence intervals of diversity in fourfold synonymous sites (FFSs) and the exome, respectively. (Red vertical lines) Motifs that belong to the indicated class of transcription factor. (Black vertical lines) Motifs where at least 50% of all instances of that motif contain a CpG dinucleotide. (B) Normalized diversity in motifs versus non-normalized diversity. Motifs with a CpG (defined as above) are plotted in red. (Dashed line) Best fit for non-CpG motifs (r = 0.70, P < 10−16).
Figure 4.
Figure 4.
Heterogeneity of polymorphism across cell types. (A) Distribution of normalized nucleotide diversity (black points) across DNase I peaks in 138 cell types. Vertical bars around peaks indicate 95% confidence intervals obtained by bootstrapping. (Blue rectangle) 95% confidence interval for normalized nucleotide diversity in fourfold degenerate sites. (B) Venn diagram showing the amount of shared and unique sequence for DNase I peaks among normal/primary, malignant, and iPS/ES cell types. The barplot on the left shows average normalized diversity for several categories of peaks in the Venn diagram. Shared all and shared two denote peaks shared among all three categories and between any two categories, respectively. N, M, and SC denotes peaks specific to normal/primary, malignant, and stem cell (iPS/ES) cell types, respectively.
Figure 5.
Figure 5.
Malignant cell lines exhibit significantly more singleton DNase I peaks than normal cell lines. (Triangles) Observed proportion of singleton peaks. (Blue and green lines) Distribution (density histograms) of singleton peaks when randomly sampling 29 (blue) or five (green) cell types; this is the distribution of the number of singleton peaks we would expect if malignant or stem cells were similar to normal cells, respectively. Note the malignant category (blue) shows significantly more singleton peaks than expected given its sample size, but the stem cell category (green) falls within the expected range.
Figure 6.
Figure 6.
Genome-wide distribution of population structure in regulatory DNA. (A) Genome-wide distribution of locus-specific branch lengths (LSBLs) for Africans, Asians, and Europeans, respectively. Note that the valley of uniform LSBL on chromosome 17 in Europeans corresponds to the MAPT region that is segregating a large chromosomal inversion (Zody et al. 2008). (B) Distribution of the proportion of highly differentiated DNase I peaks found for different categories of cell types. (SC) Stem cells (iPS/ES); (I) immortalized; (M) malignant; (N) normal/primary cell types. (C) Distribution of African LSBL across intron 1 of VDR. (D) Distribution of European LSBL across intron 4 of FTO. In panels C and D, peaks are shown as red rectangles and exons as black rectangles.

Similar articles

  • Systematic localization of common disease-associated variation in regulatory DNA.
    Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom R, Qu H, Brody J, Shafer A, Neri F, Lee K, Kutyavin T, Stehling-Sun S, Johnson AK, Canfield TK, Giste E, Diegel M, Bates D, Hansen RS, Neph S, Sabo PJ, Heimfeld S, Raubitschek A, Ziegler S, Cotsapas C, Sotoodehnia N, Glass I, Sunyaev SR, Kaul R, Stamatoyannopoulos JA. Maurano MT, et al. Science. 2012 Sep 7;337(6099):1190-5. doi: 10.1126/science.1222794. Epub 2012 Sep 5. Science. 2012. PMID: 22955828 Free PMC article.
  • Population genomics and transcriptional consequences of regulatory motif variation in globally diverse Saccharomyces cerevisiae strains.
    Connelly CF, Skelly DA, Dunham MJ, Akey JM. Connelly CF, et al. Mol Biol Evol. 2013 Jul;30(7):1605-13. doi: 10.1093/molbev/mst073. Epub 2013 Apr 25. Mol Biol Evol. 2013. PMID: 23619145 Free PMC article.
  • The accessible chromatin landscape of the human genome.
    Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B, Garg K, John S, Sandstrom R, Bates D, Boatman L, Canfield TK, Diegel M, Dunn D, Ebersol AK, Frum T, Giste E, Johnson AK, Johnson EM, Kutyavin T, Lajoie B, Lee BK, Lee K, London D, Lotakis D, Neph S, Neri F, Nguyen ED, Qu H, Reynolds AP, Roach V, Safi A, Sanchez ME, Sanyal A, Shafer A, Simon JM, Song L, Vong S, Weaver M, Yan Y, Zhang Z, Zhang Z, Lenhard B, Tewari M, Dorschner MO, Hansen RS, Navas PA, Stamatoyannopoulos G, Iyer VR, Lieb JD, Sunyaev SR, Akey JM, Sabo PJ, Kaul R, Furey TS, Dekker J, Crawford GE, Stamatoyannopoulos JA. Thurman RE, et al. Nature. 2012 Sep 6;489(7414):75-82. doi: 10.1038/nature11232. Nature. 2012. PMID: 22955617 Free PMC article.
  • Advances of DNase-seq for mapping active gene regulatory elements across the genome in animals.
    Chen A, Chen D, Chen Y. Chen A, et al. Gene. 2018 Aug 15;667:83-94. doi: 10.1016/j.gene.2018.05.033. Epub 2018 May 14. Gene. 2018. PMID: 29772251 Review.
  • Identification of altered cis-regulatory elements in human disease.
    Mathelier A, Shi W, Wasserman WW. Mathelier A, et al. Trends Genet. 2015 Feb;31(2):67-76. doi: 10.1016/j.tig.2014.12.003. Epub 2015 Jan 27. Trends Genet. 2015. PMID: 25637093 Review.

Cited by

References

    1. Akey JM 2009. Constructing genomic maps of positive selection in humans: Where do we go from here? Genome Res 19: 711–722 - PMC - PubMed
    1. Akey JM, Zhang G, Zhang K, Jin L, Shriver MD 2002. Interrogating a high-density SNP map for signatures of natural selection. Genome Res 12: 1805–1814 - PMC - PubMed
    1. Akey JM, Eberle MA, Rieder MJ, Carlson CS, Shriver MD, Nickerson DA, Kruglyak L 2004. Population history and natural selection shape patterns of genetic variation in 132 genes. PLoS Biol 2: e286 doi: 10.1371/journal.pbio.0020286 - PMC - PubMed
    1. Asthana S, Noble WS, Kryukov G, Grant CE, Sunyaev S, Stamatoyannopoulos JA 2007. Widely distributed noncoding purifying selection in the human genome. Proc Natl Acad Sci 104: 12410–12415 - PMC - PubMed
    1. Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS 2009. MEME Suite: Tools for motif discovery and searching. Nucleic Acids Res 37: W202–W208 - PMC - PubMed

Publication types

Substances

LinkOut - more resources