Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul;24(7):941-953.
doi: 10.1038/s41593-021-00858-w. Epub 2021 May 20.

Cell-type-specific effects of genetic variation on chromatin accessibility during human neuronal differentiation

Affiliations

Cell-type-specific effects of genetic variation on chromatin accessibility during human neuronal differentiation

Dan Liang et al. Nat Neurosci. 2021 Jul.

Abstract

Common genetic risk for neuropsychiatric disorders is enriched in regulatory elements active during cortical neurogenesis. However, it remains poorly understood as to how these variants influence gene regulation. To model the functional impact of common genetic variation on the noncoding genome during human cortical development, we performed the assay for transposase accessible chromatin using sequencing (ATAC-seq) and analyzed chromatin accessibility quantitative trait loci (QTL) in cultured human neural progenitor cells and their differentiated neuronal progeny from 87 donors. We identified significant genetic effects on 988/1,839 neuron/progenitor regulatory elements, with highly cell-type and temporally specific effects. A subset (roughly 30%) of chromatin accessibility-QTL were also associated with changes in gene expression. Motif-disrupting alleles of transcriptional activators generally led to decreases in chromatin accessibility, whereas motif-disrupting alleles of repressors led to increases in chromatin accessibility. By integrating cell-type-specific chromatin accessibility-QTL and brain-relevant genome-wide association data, we were able to fine-map and identify regulatory mechanisms underlying noncoding neuropsychiatric disorder risk loci.

PubMed Disclaimer

Conflict of interest statement

Declaration of Interests

The authors declare no competing interests.

Figures

Extended Data Fig. 1
Extended Data Fig. 1
Flowchart for cell culture and pre-processing of ATAC-seq data. (a) Flowchart of cell culture for 17 rounds. (b) The FACS gates for sorting EGFP+ neurons. (c) Images of immunofluorescence for cell markers in progenitor cultures. Immunolabeling experiments were repeated in at least 10 unique donor cell lines with similar results. The scale bar presents 100 μm. (d) Images of immunofluorescence for cell markers in 8-week differentiated cultures. Immunolabeling experiments were repeated in at least 10 unique donor cell lines with similar results. The scale bar presents 100 μm. (e) Box plot for total sequence depth (forward reads and reverse reads), unique read number (forward reads and reverse reads), duplicate rate, mitochondrial duplicate rate, TSS enrichment and the fraction of reads in called peak regions (FRiP score) in neurons (N=61) and progenitors (N=76) compared to previously published data (N_GZ=3 biologically independent samples with 3–4 replicates for each sample, N_CP=3 biologically independent samples with 3 replicates for each sample). The center of the box is median of the data, the bounds of the box are 25th percentile and 75th percentile of the data, and the whisker boundary is 1.5 times the IQR. Maximum and minimum are the maximum and minimum of the data.
Extended Data Fig. 2
Extended Data Fig. 2
ATAC-seq data QC. (a) Peak calling versus library sequencing depth. We observed a slower rise in the number of new peaks called after 15 millions filtered read pairs. This indicates a reasonable balance between read depth and number of peaks called using an average of 14 million read pairs after filtering in our samples. (b) Insert size histograms for 3 randomly selected neuron and progenitor samples. (c) PCA plot for ATAC-seq data (N=137) before batch correction (left) and after batch correction (right), colored by sorter. We corrected normalized reads within ATAC-seq peaks in neurons by sorter locations. Then, we corrected normalized reads within ATAC-seq peaks in neurons and progenitors by cell culture round. (d) Correlations of batch corrected normalized reads across donors and within donors. Correlations within donors was significantly higher than correlations across donors in progenitor (n=15). Correlations within donors were higher than correlations across donors in neurons (n=4), but not significant (p=0.07). P values are estimated by two-sided wilcoxon tests. The center of the box is median of the data, the bounds of the box are 25th percentile and 75th percentile of the data, and the whisker boundary is 1.5 times the IQR. Maximum and minimum are the maximum and minimum of the data. (e) Correlations between PC1 to PC10 from normalized reads in neurons with known technical and biological factors. (f) Correlations between PC1 to PC10 from batch correction normalized reads in progenitors with known technical and biological factors.
Extended Data Fig. 3
Extended Data Fig. 3
Annotating differentially accessible peaks during neuronal differentiation. (a) Gene ontology (GO) enrichment of differentially accessible peaks at the TSS. Progenitor peaks (left) and neuron peaks (right) showed enrichment for GO terms related to proliferation and differentiation, as expected. (b) TFs with significantly differentially enriched conserved binding sites in differentially accessible peaks. The statistical test identifies TFs likely involved in neural progenitor proliferation and maintenance (progenitorTFs; top) or neurogenesis and maturation (neuronTFs; bottom). The top 30 significantly enriched TFs were shown in this figure, and the full list can be found in Supplementary Table 2. Within progenitorTFs, we found TFs previously characterized to have key roles for neural stem cell renewal and reprogramming, such as SOX2,, and those known to be required for the maintenance of stem cells in cortex, such as NR2F1, ETV5, and SP2. Within neuronTFs, NEUROG2 and LMX1A were identified, which are known to drive neuronal differentiation,, as well as TFs shown to induce neuronal identity from fibroblasts, including ASCL2 and the POU family. NeuronTFs also included CUX1/2, a marker for layer II-III neurons, and other laminar markers such as TBR1 and FOXP1. (c) Schematic of known functions for selected progenitorTFs and neuronTFs.
Extended Data Fig. 4
Extended Data Fig. 4
Features of caQTLs. (a) Flowchart for caQTL data analysis. (b) PCA plot for ATAC-seq data on sex chromosomes (chrX and chrY), colored by sex from genotype data, showing sex could be called using ATAC-seq data. (c) MDS plot for genotype data of HapMap3 and donors in this study, colored by populations from HapMap3 data. ASW: African ancestry in Southwest USA; CEU: Utah residents with Northern and Western European ancestry from the CEPH collection; CHB: Han Chinese in Beijing, China; CHD: Chinese in Metropolitan Denver, Colorado; GIH: Gujarati Indians in Houston, Texas; JPT: Japanese in Tokyo, Japan; LWK: Luhya in Webuye, Kenya; MEX: Mexican ancestry in Los Angeles, California; MKK: Maasai in Kinyawa, Kenya; TSI: Toscans in Italy; YRI: Yoruba in Ibadan, Nigeria. (d) Neuron and progenitor caPeaks enrichment at epigenetically annotated regulatory elements from fetal brain (Epigenetics Roadmap ID = E081). (e) Comparison of percent variance explained (r2) for shared neuron/progenitor caQTLs and fetal brain eQTLs (subset to the same sample size). P values are estimated by two-sided paired student-t tests. The center of the box is median of the data, the bounds of the box are 25th percentile and 75th percentile of the data, and the whisker boundary is 1.5 times the IQR. Maximum and minimum are the maximum and minimum of the data.
Extended Data Fig. 5
Extended Data Fig. 5
Examples of fine-mapping and regulatory mechanisms underlying eQTLs. (a) Co-localization of a progenitor-specific caQTL and fetal cortical eQTL for ETFDH. (b) caQTL for rs11544037 and the labeled peak in progenitor (N=76). P-values are estimated by a mixed linear effects model using a two-sided test (Methods). (c) eQTL of ETFDH in bulk fetal cortex (N=235). P-values are estimated by a mixed linear effects model using a two-sided test (Methods). (d) The expression of TFs whose motifs are disrupted by rs11544037 (logFC=−0.32, FDR=7.55e-18). (e) The motif Logo of RAD21, where the red box shows the position disrupted by rs11544037. Schematic cartoon of mechanisms for rs11544037 regulating chromatin accessibility and gene expression. (f) Luciferase signals for alleles of rs11544037 in progenitors (N=8). P-value is from two-sided paired t-tests. (g) Co-localization of a progenitor-specific caQTL and eQTL for FGF1. (h) CaQTL for rs11960262 and the labeled peak in progenitor (N=76). P-values are estimated by a mixed linear effects model using a two-sided test (Methods). (i) eQTL of ETFDH in progenitors (N=85). P-values are estimated by a mixed linear effects model using a two-sided test (Methods). (j) The expression of TFs in which motifs are disrupted by rs11960262. (k) The motif Logo of EGR1, where the red box shows the position disrupted by rs11960262. Schematic cartoon of mechanisms for rs11960262 regulating chromatin accessibility and gene expression. (For box plots in (b-c), (f) and (h-i), the center of the box is the median, the bounds of the box are 25th percentile and 75th percentile of the data, and the whisker boundary is 1.5 times the IQR. Maximum and minimum are the maximum and minimum of the data.)
Extended Data Fig. 6
Extended Data Fig. 6
Features of ASCA. (a) Density plot for caPeak length from shared caQTLs and ASCA, and from peaks only significant in ASCA in neurons (top) and progenitors (bottom). P values are estimated by two-sided Student’s t-tests. (b) The neuron ASCA (caSNP: rs62332390; caPeak: chr4:148,441,611–148,46,300; P values are estimated by the negative binomial generalized linear models from DESeq2 using a two-sided test) is not a significant caQTL (N=61; P values are estimated by the mixed linear model using a two sided test) in neurons because the caPeak was very wide (4,689bp) and only the region near the ASCA SNP shows an association with genotype. (c) The neuron ASCA (caSNP:rs77191441; caPeak:chr5:116,571,961–116,576,710; P values are estimated by the negative binomial generalized linear models from DESeq2 using a two-sided test) is not a significant caQTL (N=61; P values are estimated by the mixed linear effects model with a two-sided test) in neurons due to low minor allele frequency leading to less power to detect a caQTL. (d) ASCA between rs185220 (see Figure 3) and chromatin accessibility in progenitors (left) and neurons (right). P-values are estimated by the negative binomial generalized linear models from DESeq2 using a two-sided test. (For box plots in (b) and (c), the center of the box is the median, the bounds of the box are 25th percentile and 75th percentile of the data, and the whisker boundary is 1.5 times the IQR. Maximum and minimum are the maximum and minimum of the data.)
Extended Data Fig. 7
Extended Data Fig. 7
Comparison to adult dorsolateral prefrontal cortex (DLPFC) caQTLs. (a) Shared accessible peaks overlap at epigenetically annotated regulatory elements from different tissues. Accessible peak bp percentage overlapped with epigenetically annotated regulatory elements. From left to right, tissues ordered by bp percentage overlap with enhancers and promoters. Shared peaks overlap with both adult and fetal regulatory elements. (b) PCA plot for read counts from shared peaks in adult DLPFC, neurons and progenitors. (c) Correlations of effect sizes for significant neuron caQTLs and the same SNP-Peak pairs in adult DLPFC (left). Correlations of effect sizes for significant progenitor caQTLs and the same SNP-Peak pairs in adult DLPFC (right).
Extended Data Fig. 8
Extended Data Fig. 8
An example of a neuron-specific caQTL leading to regulatory mechanisms underlying GWAS loci. (a) Numbers of colocalizations between ASCA and GWAS loci. (b) The neuron-specific significant caQTL (caSNP: rs9930307; caPeak: chr16: 9,805,221–9,805,420) co-localized with schizophrenia GWAS locus (index SNP: rs7191183). (c) Box plot for the caQTL (left, N=61; P values are estimated by the mixed linear effects model using a two-sided test) and ASCA (right) (caSNP: rs9930307; caPeak: chr16: 9,805,221–9,805,420; P values are estimated by the negative binomial generalized linear models from DESeq2 using a two-sided test). (d) he expression of TFs in which motifs are disrupted by rs9930307. (e) The motif logo of TP53 and the position disrupted by rs9930307. (f) The box plot for luciferase signal for alleles of rs9930307 in progenitors (N=8). P value is from two-sided paired student-t tests. (For box plots in (c) and (f), the center of the box is median of the data, the bounds of the box are 25th percentile and 75th percentile of the data, and the whisker boundary is 1.5 times the IQR. Maximum and minimum are the maximum and minimum of the data.)
Figure 1:
Figure 1:
Profiling genome-wide chromatin accessibility in progenitors and neurons. (a) Schematic cartoon of experimental design. SOX2 and PAX6 immunolabeled neural progenitors (left), showing over 90% of cells were radial glia. EGFP labeled differentiated neurons (right), showing expected neuronal morphology. (b) Percentage of accessible peak base pairs (bp) detected in these cultures overlapped with epigenetically annotated regulatory elements from multiple tissues. From top to bottom, tissues ordered by bp percentage overlap with enhancers and promoters. (c) PCA plot of ATAC-seq read count after batch effect correction colored by cell types, showing two separate clusters for progenitors and neurons. (d) MA plot for differentially accessible peaks between progenitors and neurons. All peaks can be found in Supplementary Table 1. (e) ATAC-seq coverage plot (average normalized read counts) for promoters of neuron expressed gene SYT13, showing higher accessibility in neurons than progenitors (LFC=−1.16, FDR=3.28e-35). ATAC-seq coverage plot (average normalized read counts) for promoter of progenitor expressed gene EMX2, showing higher accessibility in progenitors than neurons (LFC=1.62, FDR=1.12e-32). (f) Enrichment of neuron/progenitor peaks with differentially accessible peaks from fetal brain tissue. GZ: neural progenitor-enriched region encompassing the ventricular zone (VZ), subventricular zone (SVZ), and intermediate zone (IZ); CP: the neuron-enriched region containing the subplate (SP), cortical plate (CP), and marginal zone (MZ).
Figure 2:
Figure 2:
Chromatin accessibility quantitative trait loci (caQTL) in progenitors and neurons. (a) caQTL schematic. (b) Number of the most significant caSNPs relative to the distance from the center of the caPeaks (left: neuron caQTLs; right: progenitor caQTLs). The most significant caQTLs for each caPeak can be found in Supplementary Table 3. (c) Numbers of caQTLs in different functional categories. (d) Schematic cartoon of fetal cortical and cell-type specific eQTLs (Left). Percentage of neuron/progenitor caQTLs with shared effects in fetal cortical and cell-type specific eQTLs (All shared caQTLs and eQTLs can be found in Supplementary Table 4.). (e) For the most significant caSNP for each caPeak, correlation of effect sizes between shared caQTLs and eQTLs in neurons (left) and progenitors (right). (f) Comparison of percent variance explained (r2) for shared caQTLs and eQTLs (subset to the same sample size) in neurons and progenitors. We found 500 (e)caSNP-caPeak-eGene combinations in neurons and 1,025 (e)caSNP-caPeak-eGene combinations in progenitors. We observed higher percent variance explained for caQTLs than eQTLs in both neurons and progenitors. P values are estimated by the two-sided paired t-test. The center of the box is median of the data, the bounds of the box are 25th percentile and 75th percentile of the data, and the whisker boundary is 1.5 times the IQR. Maximum and minimum are the maximum and minimum of the data.
Figure 3:
Figure 3:
Fine-mapping and regulatory mechanism underlying eQTLs. (a) Co-localization of caQTL and eQTL for SETD9. ChIP-seq data of REST binding in H1 cells and neurons differentiated from H1 cells,. (b) The association between rs185220 and chromatin accessibility of the labeled peak in progenitors (top, N=76) and neurons (bottom, N=61). P values are estimated by a linear mixed effects model (Methods) using a two-sided test. (c) The association between rs185220 and expression of SETD9 in the mid-gestation cortex (top, N=235), progenitors (middle, N=85) and neurons (bottom, N=74). P values are estimated by a linear mixed effects model (Methods) using a two-sided test. (d) The expression of TFs in which motifs are disrupted by rs185220. vRG: ventricular Radial Glia; oRG: outer Radial Glia; PgS: Cycling progenitors (S phase); PgG2M: Cycling progenitors (G2/M phase); IP: Intermediate progenitors; ExN: Migrating excitatory; ExM: Maturing excitatory; ExM-U: Maturing excitatory upper enriched; ExDp1: Excitatory deep layer 1; ExDp2: Excitatory deep layer 2. (e) The motif Logo of REST, where the red box shows the position disrupted by rs185220. Schematic cartoon of proposed mechanism for rs185220 regulating chromatin accessibility and gene expression. (f) The box plot for luciferase signal for alleles of rs185220 in progenitors (N=8). P value is from a two-sided paired t-test. (For box plots in (b) (c) and (f), the center of the box is median, the bounds of the box are 25th percentile and 75th percentile of the data, and the whisker boundary is 1.5 times the IQR. Maximum and minimum are the maximum and minimum of the data.)
Figure 4:
Figure 4:
Allele Specific Chromatin Accessibility (ASCA). (a) Numbers of shared/non-shared significant caQTLs and significant ASCA in neurons (left) and progenitors (right). All significant ASCA in neurons and progenitors can be found in Supplementary Table 6. (b) Correlation of effect sizes for caQTL and ASCA from (A) neurons (top) and progenitors (bottom). (c) Co-localization of caQTL and ASCA in progenitors and neurons as well as mid-gestation cortical eQTL for FABP7. (d) Association between rs144376334 and expression of FABP7 in mid-gestation cortex (left, N=235), chromatin accessibility of the labeled peak in progenitors (middle, N=85) and neurons (right, N=74). P values are estimated using a linear mixed effects model with a two-sided test (Methods). The center of the box is median of the data, the bounds of the box are 25th percentile and 75th percentile of the data, and the whisker boundary is 1.5 times the IQR. Maximum and minimum are the maximum and minimum of the data. (e) ASCA detected at rs144376334 in progenitors (left) and neurons (right). P values are estimated using the negative binomial generalized linear model from DESeq2 with a two-sided test (Methods). (f) The expression of TFs whose motifs are disrupted by rs144376334. (g) The motif logo of JUN with boxed position disrupted by rs144376334.
Figure 5:
Figure 5:
Cell-type specificity of caQTLs. (a) Percentage of caQTLs (left)/eQTLs (right) with shared effects in neurons and progenitors. (b) Numbers of overlapped/non-overlapped caPeaks (left; P-value estimated to be less than the floating point precision value) and ASCA (right) between neurons and progenitors. P values are estimated by two-sided fisher’s exact tests. (c) Differential accessibility of progenitor caPeaks (left) and neuron caPeaks (right). (d) Numbers of caPeaks distal to promoters or proximal to promoters for neuron-specific caQTLs, progenitor-specific caQTLs and shared caQTLs between neurons and progenitors. P values are estimated by the two-sided two-proportions z-test. (e) Correlations of effect sizes of caQTLs between neurons and progenitors (left: progenitor caQTLs vs. the same caSNP-caPeak pairs in neurons; right: neuron caQTLs vs. the same caSNP-caPeak pairs in progenitors). (f) Correlations of effect sizes of eQTLs between neurons and progenitors (left: progenitor eQTLs vs. the same eSNP-eGene pairs in neurons; right: neuron eQTLs vs. the same eSNP-eGene pairs in progenitors).
Figure 6:
Figure 6:
Prediction of disrupted transcription factor (TF) binding due to genetic variation. (a) Enrichment of caSNP-disrupted motifs in accessible peaks in progenitors or in neurons. (b) Schematic of TF motifs disrupted by caSNPs associated with decreased/increased chromatin accessibility. (c) Numbers of TFs where the motif-disrupting allele was associated with increased/decreased chromatin accessibility in progenitors (left) and neurons (right). For most TFs, the motif-disrupting allele was associated with decreased chromatin accessibility in progenitors and neurons. (d) Examples of TF motifs disrupted by caSNPs associated with decreased chromatin accessibility in progenitors (POU3F2; left) and neurons (ASCL2; right). (e) Disrupting ZEB1 (a transcriptional repressor) binding motif was associated with increased chromatin accessibility in progenitors and neurons.
Figure 7:
Figure 7:
Cell-type specific caQTLs lead to regulatory mechanisms underlying GWAS loci. (a) Partitioned heritability enrichment. P values are estimated from LD score regression (two-sided test) and corrected by FDR (Methods). Data are presented as mean values +/− SE. (b) Partitioned heritability enrichment demonstrated a significant (FDR < 0.05) enrichment of heritability for surface area of the full cortex and other subregions within progenitor peaks. (c) Numbers of colocalizations between caQTLs and GWAS loci. (d) A colocalized locus between progenitor-specific caQTL and MDD GWAS. (e) Association between rs1950834 and chromatin accessibility of the labeled peak in progenitors (N=76). P values are estimated by the mixed effects linear model using a two-sided test. (f) ASCA of rs1950834 in progenitors. P values are estimated by the negative binomial generalized linear models from DESeq2 using a two-sided test (Methods). (g) Association between rs1950834 and expression of lncRNA AL12182.1 in fetal brain (left, N=235) and progenitors (right, N=85). P values are estimated by the linear mixed effects model with a two-sided test. (h) Zoomed in plot of caPeaks colored by genotype at rs1950834. (i) The expression of TFs in which motifs are disrupted by rs1950834. (j) The motif logo of ETV1 where the boxed region is disrupted by rs1950834. (For box plots in (e) and (g), the center of the box is median of the data, the bounds of the box are 25th percentile and 75th percentile of the data, and the whisker boundary is 1.5 times the IQR. Maximum and minimum are the maximum and minimum of the data.)

References

    1. Grasby KL et al. The genetic architecture of the human cerebral cortex. bioRxiv 399402 (2018) doi: 10.1101/399402. - DOI
    1. Sullivan PF & Geschwind DH Defining the Genetic, Genomic, Cellular, and Diagnostic Architectures of Psychiatric Disorders. Cell 177, 162–183 (2019). - PMC - PubMed
    1. Barešić A, Nash AJ, Dahoun T, Howes O & Lenhard B Understanding the genetics of neuropsychiatric disorders: the potential role of genomic regulatory blocks. Mol. Psychiatry (2019) doi: 10.1038/s41380-019-0518-x. - DOI - PMC - PubMed
    1. Gamazon ER et al. Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation. Nat. Genet 50, 956–967 (2018). - PMC - PubMed
    1. Lee PH et al. Principles and methods of in-silico prioritization of non-coding regulatory variants. Hum. Genet 137, 15–30 (2018). - PMC - PubMed

Publication types

MeSH terms

Grants and funding