Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 24;184(24):5985-6001.e19.
doi: 10.1016/j.cell.2021.10.024. Epub 2021 Nov 12.

A single-cell atlas of chromatin accessibility in the human genome

Affiliations

A single-cell atlas of chromatin accessibility in the human genome

Kai Zhang et al. Cell. .

Abstract

Current catalogs of regulatory sequences in the human genome are still incomplete and lack cell type resolution. To profile the activity of gene regulatory elements in diverse cell types and tissues in the human body, we applied single-cell chromatin accessibility assays to 30 adult human tissue types from multiple donors. We integrated these datasets with previous single-cell chromatin accessibility data from 15 fetal tissue types to reveal the status of open chromatin for ∼1.2 million candidate cis-regulatory elements (cCREs) in 222 distinct cell types comprised of >1.3 million nuclei. We used these chromatin accessibility maps to delineate cell-type-specificity of fetal and adult human cCREs and to systematically interpret the noncoding variants associated with complex human traits and diseases. This rich resource provides a foundation for the analysis of gene regulatory programs in human cell types across tissues, life stages, and organ systems.

Keywords: GWAS; chromatin accessibility; cis regulatory elements; enhancers; epigenome; noncoding variants; single cell ATAC-seq.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests B.R. is a shareholder and consultant of Arima Genomics, Inc., and a co-founder of Epigenome Technologies, Inc. K.J.G. is a consultant of Genentech and a shareholder in Vertex Pharmaceuticals. These relationships have been disclosed to and approved by the UCSD Independent Review Committee.

Figures

Figure 1 ∣
Figure 1 ∣. Single-cell chromatin accessibility analysis of 30 adult human primary tissues.
A) A total of 92 biosamples from 30 tissue types, were used for sci-ATAC-seq. The number of nuclei profiled per tissue is denoted in parentheses. B) Clustering of 615,998 nuclei revealed 30 major cell groups. Each dot represents a nucleus colored by cluster ID. Embedding was created by Uniform Manifold Approximation and Projection (UMAP) (McInnes et al., 2018). C) An example illustrating subclusters within the major cell group of gastrointestinal (GI) epithelial cells revealed by iterative clustering. D) Bar plot showing the number of cell types identified in each of the 30 human tissues, counting only cell types constituting >0.2% of all cells in the given tissue. E) Distribution of cell types across human tissues. The dendrogram on the left was created by hierarchical clustering of cell clusters based on chromatin accessibility. The bar chart represents relative contributions of tissues to cell clusters. Raw data are available on Mendeley Data: 10.17632/yv4fzv6cnm.1.
Figure 2 ∣
Figure 2 ∣. An atlas of cCREs in adult human cell types.
A) Classification of 890,130 cCREs across the human genome based on their distances to annotated TSSs. B) Heatmap showing the average chromatin accessibility for each of four groups (blood vessel, forebrain, heart, negative control) of validated tissue-specific enhancers from the VISTA database (Visel et al., 2007) across indicated cell types. Z-scores were calculated using all 111 cell types. The top 10 cell types in each validated enhancer group are shown. C) Average phyloP (Pollard et al., 2010) conservation scores of cCREs stratified by groups defined in A. Genomic background is indicated in gray. D) Two-dimensional density plot showing the median chromatin accessibility compared with the range (difference between maximum and minimum) of chromatin accessibility across 111 cell clusters for 890,130 cCREs, stratified by groups defined in A. E) Heatmap representation of 435,142 cCREs showing cell-type-restricted patterns in 111 cell types. Color represents log2-transformed chromatin accessibility. F,G) Heatmaps showing GO terms (F) and TF motifs (G) with maximal enrichment in cell-type-restricted cCREs of selected cell types. Only the most enriched TF motif in each of the previously identified motif archetypes (Vierstra et al., 2020) was selected as the representative and the top 10 motifs were selected for each cell type. Color represents −Log10P. Full GO and motif enrichments are available on Mendeley Data: 10.17632/yv4fzv6cnm.1.
Figure 3 ∣
Figure 3 ∣. Integrative analysis of adult and fetal single-cell chromatin accessibility atlases.
A) Number of sci-ATAC-seq cells per tissue type for 30 adult and 15 human fetal tissue types that were integrated. Matching tissue types between adult and fetal datasets are highlighted in red or blue respectively. Standard: sentinel tissue (trisomy 18 cerebrum). B) UMAP embedding of 1,323,041 nuclei from fetal and adult tissues. Each dot in the scatter plot represents a nucleus, colored by life stage. C) Heatmap showing Pearson correlation coefficient (PCC) between 69 adult cell types and 89 fetal cell types from 17 manually defined cell groups that are present in both adult and fetal tissues. A comprehensive heatmap is provided in Figure S5. D) Bar plot showing the median PCC for each major cell group indicated in C.
Figure 4 ∣
Figure 4 ∣. Differential chromatin accessibility landscapes in adult and fetal human cell types.
A) Dot plot showing the number of adult and fetal specific cCREs detected for each major cell group indicated in C. B-C) Significant GO biological process ontology terms and transcription factor motif enrichments for adult-specific (B) and fetal-specific (C) cCREs. D) Heatmap representation of 72,648 differentially accessible (DA) cCREs between fetal and adult skeletal myocytes along with significant GREAT biological process ontology enrichments (McLean et al., 2010). Color represents log-transformed normalized signal. E) Significantly enriched TF motifs within fetal and adult skeletal myocyte DA cCREs. The most enriched motif within each motif archetype (Vierstra et al., 2020) was selected and the top three were displayed. F) Genome browser tracks showing chromatin accessibility for fetal and adult skeletal myocytes along with DA cCREs between the adult and fetal skeletal myocytes. Indicated genes are shown in black, other genes are shown in gray. TSSs of the indicated genes are shaded in red and blue.
Figure 5 ∣
Figure 5 ∣. Delineation of CRE modules across 222 fetal and adult human cell types.
A) Heatmap representation of chromatin accessibility for 1,154,611 cCREs across 222 fetal and adult cell types. Color represents normalized chromatin accessibility. cCREs were organized into 150 modules by K-means clustering, indicated by the color bars on the right. 20 groups of lineage-specific modules (colored boxes) are highlighted. B-D) Heatmaps showing chromatin accessibility (B), GO terms (C) and motifs (D) with maximal enrichment in a subset of CRE modules (rows) for immune cell types. The GO and motif heatmaps are colored by enrichment −log10P. Only the most enriched TF motif in each of the previously identified motif archetypes (Vierstra et al., 2020) was selected as the representative and the top 5 motifs were selected for each module. Full GO and motif enrichments are available on Mendeley Data: 10.17632/yv4fzv6cnm.1.
Figure 6 ∣
Figure 6 ∣. Association of fetal and adult human cell types with complex traits and diseases.
A) Heatmap showing enrichment of risk variants associated with disease and non-disease traits from genome wide association studies in human cell type-resolved cCREs. Cell type-stratified linkage disequilibrium score regression (LDSC) analysis was performed using GWAS summary statistics for 240 phenotypes. Total cCREs identified independently from each fetal and adult cell type were used as input for analysis. P-values were corrected using the Benjamini Hochberg procedure for multiple tests. FDRs of LDSC coefficient are displayed. 66 selected traits were highlighted on the left, with PubMed identifiers (PMIDs) or “UKB”, indicating summary statistics downloaded from the UK Biobank, enclosed in parentheses. Numerical results are reported in Table S4. B) Dot plots showing significance of enrichment for selected traits from panel A within cCREs from 222 fetal and adult cell types. Each circle represents a cell type. Large circles pass the cutoff of FDR < 1% at −log10(P) = 3.55. The top 3 most highly associated cell types are labeled for each trait. Comprehensive data are provided in Table S4.
Figure 7 ∣
Figure 7 ∣. Systematic interpretation of molecular functions of noncoding risk variants.
A) Schematic illustrating the workflow for annotating fine-mapped noncoding risk variants. B) Table showing the number of likely causal variants (PPA > 0.1), number of cCREs overlapping likely causal variants, number of cell types in which overlapping cCREs are accessible, top cell types variants are enriched in based on LD score regression (Bulik-Sullivan et al., 2015), number of predicted target genes for likely causal variants, and significantly altered motifs predicted by deltaSVM model trained using SNP-SELEX data for 10 examples out of 48 total fine-mapped diseases and traits. Comprehensive data are provided in Table S5. C,D) Fine mapping and molecular characterization of an ulcerative colitis (UC) risk variant (C) in a gastrointestinal (GI) epithelial cell cCRE and an osteoarthritis variant (D) in an immune cell cCRE. Genome browser tracks (GRCh38) display ChIP-seq and DNase-seq from ENCODE human colon datasets (C) and primary T cell datasets (D) as well as chromatin accessibility profiles for cell types from sci-ATAC-seq. Chromatin interaction tracks show linkages between the variant-containing cCREs and genes from promoter capture Hi-C data via Activity-by-Contact (ABC) (Fulco et al., 2019) analysis. All linkages shown have an ABC score > 0.015. PPA: Posterior probability of association.

References

    1. Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, Chen Y, Zhao X, Schmidl C, Suzuki T, et al. (2014). An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461. - PMC - PubMed
    1. Astle WJ, Elding H, Jiang T, Allen D, Ruklisa D, Mann AL, Mead D, Bouman H, Riveros-Mckay F, Kostadima MA, et al. (2016). The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease. Cell 167, 1415–1429.e1419. - PMC - PubMed
    1. Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, and Abecasis GR (2015). A global reference for human genetic variation. Nature 526, 68–74. - PMC - PubMed
    1. Bentham J, Morris DL, Graham DSC, Pinder CL, Tombleson P, Behrens TW, Martin J, Fairfax BP, Knight JC, Chen L, et al. (2015). Genetic association analyses implicate aberrant regulation of innate and adaptive immunity genes in the pathogenesis of systemic lupus erythematosus. Nat Genet 47, 1457–1464. - PMC - PubMed
    1. Black AR, Black JD, and Azizkhan-Clifford J (2001). Sp1 and kruppel-like factor family of transcription factors in cell growth regulation and cancer. Journal of Cellular Physiology 188, 143–160. - PubMed

Publication types