Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 15;111(6):857-873.e8.
doi: 10.1016/j.neuron.2022.12.026. Epub 2023 Jan 13.

Machine learning dissection of human accelerated regions in primate neurodevelopment

Affiliations

Machine learning dissection of human accelerated regions in primate neurodevelopment

Sean Whalen et al. Neuron. .

Abstract

Using machine learning (ML), we interrogated the function of all human-chimpanzee variants in 2,645 human accelerated regions (HARs), finding 43% of HARs have variants with large opposing effects on chromatin state and 14% on neurodevelopmental enhancer activity. This pattern, consistent with compensatory evolution, was confirmed using massively parallel reporter assays in chimpanzee and human neural progenitor cells. The species-specific enhancer activity of HARs was accurately predicted from the presence and absence of transcription factor footprints in each species. Despite these striking cis effects, activity of a given HAR sequence was nearly identical in human and chimpanzee cells. This suggests that HARs did not evolve to compensate for changes in the trans environment but instead altered their ability to bind factors present in both species. Thus, ML prioritized variants with functional effects on human neurodevelopment and revealed an unexpected reason why HARs may have evolved so rapidly.

Keywords: ATAC-seq; ChIP-seq; Hi-C; MPRA; accelerated regions; enhancers; evolution; gene regulation; machine learning; neurodevelopment.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests A.K. is a cofounder, consultant, and member of the board of Neurona Therapeutic, a company studying the potential therapeutic use of interneuron transplantation. J.L.R. is a cofounder, stockholder, and member of the board of Neurona. N.A. is a cofounder and on the scientific advisory board of Regel Therapeutics and Neomer Diagnostics and receives funding from BioMarin Pharmaceutical Incorporate. K.K. is currently an employee of Fauna Bio.

Figures

Figure 1.
Figure 1.. Characterization of chimpanzee and human neural progenitor cells.
(A-C) Brightfield images of human iPSCs (A). iPSC differentiated into neural rosettes (B) and N2 cells (C) demonstrating typical morphology. (D) Human iPSCs demonstrate normal karyotypes. (E) Human N2 cells express Paired Box 6 (PAX6), a neural marker. (F) Human N3 cells express Glial Fibrillary Acidic Protein (GFAP), a glial marker. (G-I) Brightfield images of chimpanzee iPSCs (G). iPSC differentiated into neural rosettes (H) and N2 cells (I) demonstrating typical morphology. (J) Chimpanzee iPSCs demonstrate normal karyotypes. (K) Chimpanzee N2 cells express PAX6. (L) Chimpanzee N3 cells express GFAP. (M) Percentage of cells in scRNA-seq expressing genes that are markers for the cell cycle or telencephalon and neuronal cell types. Human and chimpanzee N2 and N3 cells show comparable marker expression for radial glia and telencephalon. For example, 50–90% of cells expressed FOXG1, a marker of the telencephalon. (N-O) Coverage (CPM) of H3K27ac ChIP-seq reads at HARs, sorted by maximum CPM, in human (N) and chimpanzee (O) N2 cells. (P) Human and chimpanzee N2 H3K27ac TF footprints are largely concordant, but some TF families with LIM, POU and homeodomains show species-biased enrichment. Select TFs expressed in NPCs with large differences in q-value between species are labeled.
Figure 2.
Figure 2.. The in vivo epigenetic landscape of HARs.
A large collection of open chromatin (ATAC-seq, DNase-seq) and ChIP-seq (TF, histone) datasets from human primary tissues (49% brain, 48% heart, 2% limb; Table S3) were intersected with HARs. (A) Upset plot showing that 1846/2645 HARs overlap at least one type of open chromatin (ATAC-seq, DNase-seq) or activating (H3K4me1, H3K4me3, H3K9ac, H3K27ac, or H3K36me3) mark, while 616/2645 have overlap all three (i.e., ATAC-seq, DNase-seq, and an activating histone). The purple histogram shows the number of HARs with the denoted combination of marks, while the black bars to the left show the number of marks that overlap a HAR. (B) HAR overlaps with activating marks and open chromatin in other tissues. There are significantly more overlaps for the brain compared to non-brain tissues (p-value < 2e-16). Joint heart and brain overlaps are shown in Figure S2. (C) Two-dimensional UMAP projection of HARs (grey) with VISTA heart (red) and brain (purple) enhancers showing that some HARs cluster with in vivo validated enhancers. (D) HARs (horizontal axis, sorted so those most similar to VISTA brain enhancers are on the left) with their epigenetic profiles (vertical axis; black indicates overlapping epigenetic features). Shown are the epigenetic features most predictive in a ML model of VISTA brain enhancers (purple) versus non-brain enhancers (VISTA negatives plus enhancers active in other tissues; red), along with their model coefficients (left).
Figure 3.
Figure 3.. Validation of an active HAR enhancer regulating ROCK2.
2xHAR.183 was selected for further validation due to its high enhancer score (Figure 2). (A) 2xHAR.183 has a significant chromatin loop with the ROCK2 gene in excitatory neuron PLAC-seq data (5kb resolution binary loop call) and contacts ROCK2 in our N2/N3 Hi-C. The gene E2F6 is nearby on the linear genome but has fewer chromatin contacts. 2xHAR.183 overlaps multiple annotations from fetal brain datasets. Chimpanzee and human epigenetic datasets across early neurodevelopment suggest 2xHAR.183 starts and remains accessible in both species, while gaining acetylation beginning at the N2 stage. The activation signature appears later and stronger in chimpanzee versus human cells. (B) Footprints of known neurodevelopmental TFs, C/EBPBeta and RFX2, are contained within 2xHAR.183 and overlap human:chimpanzee variants (colored sites in chimpanzee and human sequences). Additional footprints for PRDM1 and BCL11A were detected adjacent to and partially overlapping the HAR. Height of the nucleotides in each motif indicates information content (0 to 2 bits). (C) CRISPRa validation (3 replicates per target, 4 per control) shows 2xHAR.183 drives strong expression of ROCK2, but not the proximal gene E2F6. Variability between replicates is small for low expression values.
Figure 4.
Figure 4.. Human-specific variants shift HAR enhancer profiles in a deep-learning model.
Every human-specific variant in each HAR was evaluated using the deep-learning model Sei. Variants where the human nucleotide decreases the chromatin state are blue (shade denotes amount of decrease), variants where the human nucleotide increases the chromatin state are red, and complex variants that cannot be scored by Sei are white. (A) The landscape of chromatin state changes (y-axis) induced by all human:chimpanzee variants across all HARs (x-axis), sorted by predicted impact on brain enhancer state. (B) The 50 HAR variants that most increase or decrease brain enhancer state for all HARs that were active in our MPRA. The x-axis shows the HAR name, the offset of the variant from the HAR’s start position, and the human and chimpanzee alleles colored by species and separated by a colon. (C) Histogram of predicted enhancer state changes for all HAR variants from (A). Mean state changes for different classes of variants are shown via vertical lines: 1000 Genomes common variants, de novo mutations in healthy individuals, disease-causing mutations (from smallest to largest mean change). Many HAR variants have effects that exceed those of phenotype-associated human polymorphisms. (D) Histogram of predicted brain enhancer state changes for the most disruptive HAR variants in active HARs. Mean state changes for different classes of variants as in (C). (E) For 12 HARs containing variants with the largest effects on brain enhancer activity in our Sei analysis, we observed a mix of variants predicted by Sei to increase and decrease enhancer activity. Variants (x-axis) are annotated with their offset from the start of the HAR plus the human and chimpanzee alleles separated by a colon. HARs are annotated with the closest protein-coding gene.
Figure 5.
Figure 5.. Species-biased HAR enhancers identified in chimpanzee and human NPCs.
We performed lentiMPRAs in chimpanzee and human cell lines at the N2 and N3 stages of differentiation. (A) Enhancer activity (RNA/DNA ratios batch corrected and normalized for sequencing depth) was highly correlated between technical and biological replicates for eighteen samples passing quality control: 3 replicates (shades of grey) of Pt2a (chimpanzee; dark grey), WTC (human; medium grey), and HS1–11 (human; light grey) iPSC lines differentiated into N2 (medium grey) and N3 (dark grey) cells. (B) Effect size (t-statistic) vs significance (−log10 q-value) for the ratio of human and chimpanzee HAR sequence activity for active HAR enhancers. HARs with species-biased activity are plotted in dark green (chimpanzee sequence more active) or dark blue (human sequence more active). (C) Roughly a third of human HAR sequences are active across samples (log RNA/DNA > median of positive controls in at least 9/18 replicates), and 11% are human-biased (differentially active with human:chimpanzee ratio > 1). (D) Distribution of human HAR sequence enhancer activity for inactive (grey) or active HARs, with active split into human-biased (dark blue) versus not (light blue). (E) Roughly a third of chimpanzee HAR sequences are active across samples, and 14% are chimpanzee-biased. (F) Histogram of chimpanzee sequence activity as in (D). (G) HARs active in lentiMPRA are enriched for many neurodevelopmental GO terms. Colors indicate the type of term: red = molecular function (MF), orange = biological process (BP), green: cellular compartment (CC).
Figure 6.
Figure 6.. Variants in TF footprints predict HAR species bias.
(A) The effects of HAR variants in TF footprints (in human N2 H3K27ac ChIP-seq) on brain enhancer activity were predicted using Sei. For each TF, the largest decrease in brain enhancer state over all variants (x-axis) is shown against the number of variant-containing footprints (y-axis). Select TFs expressed in NPCs (TPM > 1) and scoring high on one or both metrics are labeled. (B) TFs with the largest predicted increase in brain enhancer activity in the analysis from (A). (C) The species-bias of HAR lentiMPRA activity can be predicted accurately using human and chimpanzee N2 H3K27ac footprints for TFs expressed in NPCs as features in a gradient boosting model. The most important TFs for accurate predictions are shown along with their variable importance scores.
Figure 7.
Figure 7.. Variants in HARs interact to tune enhancer activity.
(A) All evolutionary intermediates between chimpanzee and human alleles of 2xHAR.170 were tested via lentiMPRA. Individual variants showed a range of effects on activity (y-axis; log2(RNA/DNA)) that correlated with Sei predicted effects on brain enhancer activity (red = human increase, blue = human decrease). Oligos containing multiple variants revealed interactions between variants that were untestable with Sei (no color). (B) We assessed the importance of each variant using a gradient boosting model that predicts lentiMPRA activity of each permutation using the presence or absence of the human allele at each of the three variants. Interactions between multiple variants (separated by colons on the y-axis) were included as predictors alongside main effects (no colon) to assess their predictive importance. This model confirmed the importance of specific variant interactions (x-axis, positive = higher predicted activity, negative = lower activity). Variant names consist of a V followed by the variant number (1, 2 or 3 ordered from 5’ to 3’), and the allele is shown after the equal sign. Present (yellow) indicates the expected change in enhancer activity for oligos that have the allele denoted on the y-axis, while absent (purple) shows the expected change for oligos that lack the allele. Yellow points at positive impact on RNA/DNA values means that variant or variant combination increases enhancer activity on average across oligos with other variants, while purple points at positive values mean the variant or variant combination decreases activity (i.e., activity is higher when absent). (C) 2xHAR.170 is a candidate intronic enhancer of GALNT10, acquiring a human-specific change from C to T that enhances POU4F1 and TEF binding in our footprint analysis. The human polymorphism rs2434531 is an eQTL for GALNT10, and we predict that the derived allele enhances binding of the repressor HMX1. The HMX1 and TEF footprints were detected in an independent brain footprinting study. Both results are supported by Sei predictions (4E), lentiMPRA activity (A), and differential activity between the chimpanzee (D) and human (E) sequences in the forebrain and midbrain of transgenic mouse embryos. Adapted from. The eQTL (rs2434531) is in linkage disequilibrium with a schizophrenia GWAS variant (rs11740474). In neuronal cells, 2xHAR.170 is bound by FOXP2, as well as other enhancer-associated proteins (ISL1, HAND2, PHOX2B, FOSL2) and chromatin modifiers (EZH2, SMARCA2, SMARCC1).

References

    1. Hubisz MJ, and Pollard KS (2014). Exploring the genesis and functions of Human Accelerated Regions sheds light on their role in human evolution. Curr. Opin. Genet. Dev 29, 15–21. - PubMed
    1. Franchini LF, and Pollard KS (2017). Human evolution: the non-coding revolution. BMC Biol. 15, 89. - PMC - PubMed
    1. Burns JK (2004). An evolutionary theory of schizophrenia: cortical connectivity, metarepresentation, and the social brain. Behav. Brain Sci 27, 831–55; discussion 855–85. - PubMed
    1. Crow TJ (1997). Is schizophrenia the price that Homo sapiens pays for language? Schizophrenia Research 28, 127–141. 10.1016/s0920-9964(97)00110-2. - DOI - PubMed
    1. Babbitt CC, Warner LR, Fedrigo O, Wall CE, and Wray GA (2011). Genomic signatures of diet-related shifts during human origins. Proceedings of the Royal Society B: Biological Sciences 278, 961–969. 10.1098/rspb.2010.2433. - DOI - PMC - PubMed

Publication types