Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec;55(12):2075-2081.
doi: 10.1038/s41588-023-01565-x. Epub 2023 Nov 16.

Primate-specific ZNF808 is essential for pancreatic development in humans

Collaborators, Affiliations

Primate-specific ZNF808 is essential for pancreatic development in humans

Elisa De Franco et al. Nat Genet. 2023 Dec.

Abstract

Identifying genes linked to extreme phenotypes in humans has the potential to highlight biological processes not shared with all other mammals. Here, we report the identification of homozygous loss-of-function variants in the primate-specific gene ZNF808 as a cause of pancreatic agenesis. ZNF808 is a member of the KRAB zinc finger protein family, a large and rapidly evolving group of epigenetic silencers which target transposable elements. We show that loss of ZNF808 in vitro results in aberrant activation of regulatory potential contained in the primate-specific transposable elements it represses during early pancreas development. This leads to inappropriate specification of cell fate with induction of genes associated with liver identity. Our results highlight the essential role of ZNF808 in pancreatic development in humans and the contribution of primate-specific regions of the human genome to congenital developmental disease.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Identification of homozygous variants in ZNF808 as a cause of pancreatic agenesis.
a, Schematic representation of the ZNF808 gene and the pathogenic variants identified in 13 families. Deletions are highlighted in red above the gene cartoon while the loss-of-function variants are represented below. The KRAB and zinc finger domains are annotated at the gene level. b, Partial pedigrees of the 13 families with homozygous ZNF808 variants. NA, sample not available for testing; N/N, variant not detected; N/M, heterozygote for variant; M/M, homozygote for variant. Two black lines between parents indicate individuals who were known to be related. A black line and a dashed line between parents indicate individuals who were not known to be related at testing but were confirmed to be consanguineous by homozygosity mapping of next-generation sequencing data calculated with SavvyHomozygosity.
Fig. 2
Fig. 2. ZNF808 is a primate-specific gene targeting transposons of similar evolutionary age.
a, Reconstructed phylogeny of ZNF808 using a zinc finger signature approach. The three amino acids of each zinc finger directly contacting DNA were used to build a specific functional signature to track evolution of ZNF808, as previously described. Zinc finger domains are color-coded according to the number of variants in each triplet compared with the human version. Notable events of loss or gain of zinc fingers are also represented. No appreciable homology with any zinc finger array was detected in New World monkeys or in any other mammals. Silhouettes of representative species are all from PhyloPic.org. b, ZNF808 binds primarily MER11 transposable elements. Analysis of ZNF808 ChIP–seq data reveals that it primarily intersects with transposable elements, although a few binding sites are found on gene promoters and other genomic regions. Further analysis of transposons shows that ZNF808 binds primarily elements of the MER11 family—MER11A, MER11B and MER11C. c, MER11 transposable elements are primate-specific. The origin of each individual MER11 element in the human genome was traced using a comparative multiple alignment of 241 species. The age of each element was determined as corresponding to the farthest phylogenetic branch where we could find a similar copy at a syntenic locus. Data points are plotted as the sum of elements found to have originated at each phylogenetic branch per subfamily—the x axis is scaled according to time in million years from estimates between the human genome and each phylogenetic branch common ancestor. A curve is interpolated between the data points to show the estimated rate of replication between phylogenetic branches. The scale is relative to each subfamily and is indicative of proportional changes between phylogenetic categories—the highest point for each subfamily is annotated with the number of new elements for scale. Source data
Fig. 3
Fig. 3. Loss of ZNF808 unmasks the regulatory potential of MER11 elements during pancreatic differentiation.
a, ZNF808 gene editing using CRISPR. Cpf1 and two guide RNAs targeting the zinc finger array were used in H1 stem cells to produce the ZNF808 KO. b, In vitro differentiation protocol to pancreatic progenitors. Overview of the multistep differentiation protocol used in this study for both epigenetic and transcriptomic analysis of differences induced by ZNF808 KO. Stage numbers from S0 to S4 are used through the text to refer to specific steps. c, ZNF808 is important for the maintenance of epigenetic repression on MER11 elements. Left, H3K9me3 ChIP–seq peaks intersecting with MER11 elements reveal that most sites are covered in heterochromatin-associated H3K9me3 and are bound by ZNF808. A proportion of these peaks lose H3K9me3 in ZNF808 KO clones at various stages of differentiation (top). Results show that many MER11 elements that had H3K9me3 signal in the WT gain H3K27ac in the ZNF808 KO, especially at the early stages of differentiation (bottom). d, A subset of MER11 elements is activated in the ZNF808 KO heatmap showing clustering of 220 MER11 elements displaying a loss of H3K9me3 followed by a gain of H3K27ac in at least one stage of differentiation. Normalized ChIP–seq signal is shown—color scale ranges from +3 (red) to −3 (blue) on a z-score scale. Source data
Fig. 4
Fig. 4. Loss of ZNF808 during pancreas differentiation leads to activation of genes in proximity to unmasked MER11 elements and induction of a liver gene expression program.
a, Loss of ZNF808 leads to perturbed gene expression throughput pancreatic differentiation. Bar chart showing total genes activated and repressed with FDR < 0.05 and |FC| > 1.25 for each stage of pancreatic differentiation. K, thousands. b, Dysregulated genes are found in proximity to unmasked MER11 elements in the ZNF808 KO early in differentiation. Proximity enrichment (−log10 Fisher exact right-tail P value) showing an excess of dysregulated genes compared to all genes in proximity to MER11 elements losing H3K9me3 and gaining H3K27ac in the ZNF808 KO as a function of distance between genes and binding sites for genes activated (orange) and repressed (blue) on log10 scale. Enrichment peaks between 10 kb and 100 kb suggest ZNF808 repressing distal gene enhancers. A total of 43.4% and 23.9% of activated genes at S0 and S1, respectively, are within 1 MB of a MER11 element. c, Hepatic cords genes are activated and dorsal pancreas bud genes are repressed in ZNF808 KO. Top, Fisher exact enrichment between ZNF808 KO activated and repressed genes and genes more highly expressed in CS12–14 hepatic cords or dorsal pancreatic buds, respectively. Bottom, log2 fold-change ZNF808 KO over WT for dorsal pancreas bud (left, blue) and hepatic cords (right, orange) genes. Each dot represents a single gene. d, Genes exclusively expressed in liver and activated in hepatic cords are activated in ZNF808 KO. The log2 fold-changes of the 29 genes that are expressed in CS12–14 hepatic cords, exclusively expressed in GTEx liver and activated at S2 in ZNF808 KO. Error bars give DESeq2 standard error of the log2 fold-change, n = 3 independent biological replicates. e, Immunostaining of AFP and PDX1 at S3 (posterior foregut) stage. Confirmation of RNA-seq results by immunostaining, showing activation of AFP in ZNF808 KO in PDX1 positive cells (representative of three independent differentiation experiments; scale bar, 100 μm). Independent replicate stainings from the same cell lines are given in Extended Data Fig. 3d.
Extended Data Fig. 1
Extended Data Fig. 1. The ubiquitously expressed ZNF808 is the first primate-specific gene confirmed to cause a congenital developmental disease.
a. ZNF808 is expressed across GTEx adult tissues. ZNF808 is expressed at variable levels across adult tissues with no tissue absent in expression. Boxplots describe ZNF808 expression in the GTEx (Genotype-Tissue Expression) project. Data shows sum of GTEx v8 isoform level transcripts per million data calculated with RSEM. Boxplot central line denotes median, box limit the interquartile range and whiskers extend to furthest point within 1.5 interquartile range, data points are outliers exceeding whiskers, median sample size n = 291. b. ZNF808 is maximally expressed in the embryonic pancreas and minimally expressed in the embryonic liver. Human embryo RNA-seq spanning CS14-22 (data normalized as in original publication) for all MER11-binding KZFPs, tissues ordered by expression and mean of replicates shown. Dots give expression for all replicates, central line of boxplots denotes median, box marks interquartile range and whiskers 1.5x interquartile range. c. ZNF808 is the only primate-specific gene confirmed to cause a congenital developmental disease. Homology scores (including 1-to-1 and 1-to-many orthologues) for every protein-coding gene (Ensembl Biomart,10th February 2020, release 98) for 26 primates and 70 non-primate mammals. The difference between the maximum % identity difference across all primates versus the maximum across all non-primates to human was calculated for each gene. Frequency densities (density estimations scaled to group size at 0.5% bin size) shown. Genes without a non-primate ortholog are plotted as percent identity between humans and primates (red), equivalent to a gene with 0% identity between primates and non-primates. Genes are grouped into non-disease causing (not present in OMIM-morbid; gray), disease causing (present in OMIM-morbid; green) and genetic causes of developmental disorders (present on DDG2P; blue). ZNF808 is highlighted (dotted line, ZNF808 is erroneously annotated as having a non-primate ortholog and its homology between primates and non-primates given). All OMIM-morbid and DDG2P genes with sequence identity difference >40 were manually checked with no evidence of a primate-specific disease gene causing a congenital developmental disorder found (Supplementary Table 4).
Extended Data Fig. 2
Extended Data Fig. 2. MER11 elements subsets are enriched for various transcription factors.
a. Full list of hits from the ChIP-Atlas database with Fisher exact test right-tail p-value < 1e-100 enriched in various MER11 subfamilies, without multiple comparison adjustment. The color of each bubble is scaled with p-value and the radius with enrichment. If multiple experiments were found to be enriched for any given factor only the most significant value is shown. b. Signal from selected factors overlay on a multiple alignment of MER11 sequences to show various subsets of sequences being targeted by specific transcription factors and KZFPs. Top left - Overlay of signal from published KZFPs ChIP–seq on a multiple alignment of MER11A, MER11B and MER11C elements reveal that ZNF808 binds strongly in the centre of these elements. Five other KZFPs can be found on smaller subsets of elements - a clear pattern of semi-exclusive binding between ZNF808 and ZNF525 / ZNF578 is visible. All KZFPs binding MER11 elements represented here were found to be primate-specific at various levels. Multiple alignments generated with MAFFT v7.475 as performed in.
Extended Data Fig. 3
Extended Data Fig. 3. ZNF808 KO and protocol of differentiation toward beta cells.
a. Differentiation protocol to generate pancreatic endoderm, progenitors and islets from human embryonic stem cells. b. Flow cytometry analysis for the definitive endoderm marker CXCR4. n = 3 independent differentiation experiments, data are presented as mean values ± SEM. Axis labels state the marker and fluorochrome used. (a.u.=arbitrary unit, S1=definitive endoderm stage). c. Flow cytometry analysis for the pancreatic progenitors markers PDX1 and NKX6-1 at S4 (pancreatic progenitors stage). n = 3 independent differentiation experiments, data are presented as mean values ± SEM. Axis labels state the marker and fluorochrome used. (a.u.=arbitrary unit). d. Immunohistochemistry analysis of S3 (posterior foregut stage) monolayer cells for PDX1 and alpha-fetoprotein (AFP); and S4 (pancreatic progenitors stage) monolayer cells for PDX1, NKX6-1, SOX9 and NGN3 (representative of 3 independent differentiation experiments, scale bar = 200 um).
Extended Data Fig. 4
Extended Data Fig. 4. Epigenetic profile of ZNF808 peaks.
a. Analysis of H3K9me3 ChIP–seq peaks intersecting with ZNF808 peaks reveals that the majority of sites are covered in heterochromatin-associated H3K9me3 at early differentiation stages in the wild type. There is a partial loss of H3K9me3-positive loci in the ZNF808 KO in stem cells and during differentiation. Analysis of H3K27ac reveals that a few sites are positive in wild-type cells but many more gain activity in the ZNF808 KO at all stages, particularly at S0 and S1. b. Analysis of ZNF808 peaks shows that around half have H3K9me3 status in at least one stage of differentiation and that the vast majority of those are transposable elements of the MER11 family. Amongst those, the majority loses H3K9me3 in at least one differentiation stage in the ZNF808 KO – 21.7% of those also gain H3K27ac. These are similarly highly enriched in MER11 elements, although with a different distribution of subfamilies. Interval intersections were performed using pybedtools 0.81 using repeats annotation from repeatmasker.org for hg19, version RepeatMasker open-4.0.5 - Repeat Library 20140131. Promoter coordinates were downloaded from Ensembl Biomart and extended 2.5 kb from the start site of all protein-coding genes. c. Heatmap showing H3K9me3 and H3K27ac signals over all MER11 elements in wild type and ZNF808 KO during differentiation. Different patterns of loss of H3K9me3 and gain of H3K27ac are visible. Color scale is based on z-score per element of reads with MAPQ > 20, ranging from +2 (red) to –2 (blue).
Extended Data Fig. 5
Extended Data Fig. 5. Dynamic epigenetic clusters of MER11 elements are enriched in transcription factors matching the stage where they are active.
a. Heatmap showing the epigenetic status of MER11 elements – this is a replicate from Fig. 3d to allow visual reference of clusters that gain H3K27ac at different points of the differentiation. b. Per cluster breakdown of subfamily of MER11 elements. Some trends can be observed, such as MER11A elements being found in increased proportion in cluster (#1, 5 and #6) which show H3K27ac signal gain at S4. c. Heatmap showing the percentage of intersection with ChIP–seq of selected transcription factors and their relationship with either all MER11 elements (top) or epigenetic active clusters identified in Fig. 3c. d. Similarly, here is shown the intersection of activated clusters with KZFPs found to be enriched in MER11 elements.
Extended Data Fig. 6
Extended Data Fig. 6. KZFPs and transcription factors binding MER11 elements have dynamic expression profiles during differentiation.
a, b. Expression of MER11-binding KZFPs. a- Heatmap showing maximum normalized mean expression in wild-type cells. b. Expression of all replicates in wild type and ZNF808 KO. c–d. Expression of MER11-binding transcription factors. As (a-b) for transcription factors GATA3, GATA4, GATA6, HNF4A, HNF4G.
Extended Data Fig. 7
Extended Data Fig. 7. MER11 elements are active in various cell types.
a. Overlap between accessible regions in 222 cell types from scATAC-seq human and fetal atlas and either all MER11 elements (blue) or the 220 elements that lose repression and gain activity in the ZNF808 KO (red). On the right, a zoom of the 15 cell types with the highest overlap in the ZNF808 KO activated MER11 elements. b. Heatmap of all MER11 elements activated in the ZNF808 KO intersecting with a peak of chromatin accessibility in the scATAC-seq dataset. c. Percentage of overlap with scATAC-seq peaks in select cell types and clusters of H3K27ac activity found in the ZNF808 KO during differentiation. Fetal syncytio/cytotrophoblasts are found in all 6 clusters at a higher percentage compared to MER11 background while other cell types are more enriched in cluster #6, which gains H3K27ac at S4. d. Examples of MER11 elements active in selected cellular contexts. All examples are taken from the list of MER11 losing H3K9me3 and gaining H3K27ac in the ZNF808 KO. e. Epigenetic status of MER11 elements in the NIH Roadmap dataset – biosamples were collapsed per cell type and chromatin state predictions at the single base pair level were used. Here we focus on two chromatin states classes we have built from the aggregate of multiple smaller ones, H3K9me3-positive heterochromatin (containing H3K9me3-associated categories “ZNF_Rpts” and “Het”) or Enhancer/TSS (aggregate of TSS-associated categories such as “TssA”, “TssFlnk”, “TssFlnkU”, “TssFlnkD” and enhancer-associated categories such as “EnhG1”, “EnhG2”, “EnhA1”, “EnhA2”, “EnhWk”). There are 18 active MER11 elements for ‘Pancreas’ versus 100 for ‘Liver’. Results show the same trends observed by ATAC-seq, notably that multiple MER11 elements are active in placenta or liver, but not in pancreas. Source data
Extended Data Fig. 8
Extended Data Fig. 8. Validation of epigenetic and transcriptomic dysregulation observed in the ZNF808 KO cells using patient-derived iPSCs.
a. Heatmap of the 220 MER11 elements that lose H3K9me3 and gain H3K27ac in the ZNF808 KO as presented in Fig. 3, with the addition of signal obtained when differentiating iPSCs up to S3. b. Left, boxplots showing fold-change of activated and repressed dysregulated genes identified in ZNF808 KO and patient-derived iPSCs, showing agreement in direction and magnitude of the gene expression perturbation. Boxes mark interquartile range, with central line describing the median and whiskers 1.5x interquartile range. N = total activated and repressed dysregulated genes, see Fig. 4a. Right, linear regressions between ZNF808 KO and iPSC log2 fold-change KO over WT at same set of genes. Regression coefficients and p-value of slope term given. c. qRT-PCR for five hepatic marker genes assayed at the posterior foregut stage (S3) in cells derived from H1 control, H1-ZNF808-KO and the patient iPSC (line HEL340.7) carrying the ZNF808 deletion (n = 3-4 independent differentiation experiments). Data are presented as mean values ± SEM. Unpaired two-tailed t test.
Extended Data Fig. 9
Extended Data Fig. 9. Unmasked MER11 elements drive proximal gene activation in ZNF808 KO and dysregulated genes are associated with a loss of pancreatic identity and a gain of hepatic identity.
a. Example locus showing loss of repression and activation of a MER11 element in the ZNF808 KO with activation of the adjacent gene. Locus shows H3K9me3 and H3K27ac data in reads per million upstream of the TMTC1 promoter. H3K9me3-marked MER11 element in wild type is lost at S0 in ZNF808 KO concomitant with gain of H3K27ac. b. Gene expression for TMTC1 shown in transcripts per million (TPM), showing robust upregulation at S0, S1 and S2 stages. c. Proximity enrichments between pairs of MER11 clusters identified in Fig. 3 and genes activated at each stage. Fisher exact test right-tail –log10 p-value is denoted by color and odds ratio by size of dot. Minimal p-value for gene-element pairs in the range [1 bp, 100 Mb] and the odds ratio at that minimum given. d. Enrichr gene set enrichments for activated (orange) or repressed (blue) genes in ZNF808 KO selected gene sets and terms (see Supplementary Table 8) shown. Enrichments for activated genes are given for fetal liver, liver and regulation of TGF-beta, repressed otherwise. Fisher exact test right-tail –log10 p-value is denoted by color and odds ratio by size of dot.
Extended Data Fig. 10
Extended Data Fig. 10. Genes exclusively expressed in liver are activated in ZNF808 KO.
a. Heatmap of GTEx liver-exclusive genes. 357 genes for which their lower quartile of expression in liver exceeds the upper quartile of expression in all other tissues. Shown as z-score of median expression. Liver and pancreas highlighted. b. Over-representation of liver-exclusive genes in the list of genes activated in ZNF808 KO. Bubble plot with Fisher Exact test -log10 p-value denoted by color and odds ratio denoted by dot size. c-e. TDO2 is a liver-exclusive gene (c), activated at S2, S3 and S4 in ZNF808 KO (d). TDO2 is adjacent to two MER11 elements for which the leftmost loses H3K9me3 throughout the time course and gains H3K27ac from S1 onwards. Boxplots in (c) describe gene level TDO2 expression in the GTEx (Genotype-Tissue Expression) project. Data shows sum of GTEx v8 isoform level transcripts per million data calculated with RSEM. Boxplot central line denotes median, box limit the interquartile range and whiskers extend to furthest point within 1.5 interquartile range, data points are outliers exceeding whiskers, median sample size n = 291. The GTEx Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health and by NCI, NHGRI, NHLBI, NIDA, NIMH and NINDS. The data used for the analyses described in this manuscript were obtained from the GTEx Portal on 08/26/21.

References

    1. Allen HL, et al. GATA6 haploinsufficiency causes pancreatic agenesis in humans. Nat. Genet. 2011;44:20–22. doi: 10.1038/ng.1035. - DOI - PMC - PubMed
    1. Shaw-Smith C, et al. GATA4 mutations are a cause of neonatal and childhood-onset diabetes. Diabetes. 2014;63:2888–2894. doi: 10.2337/db14-0061. - DOI - PMC - PubMed
    1. De Franco E, et al. A specific CNOT1 mutation results in a novel syndrome of pancreatic agenesis and holoprosencephaly through impaired pancreatic and neurological development. Am. J. Hum. Genet. 2019;104:985–989. doi: 10.1016/j.ajhg.2019.03.018. - DOI - PMC - PubMed
    1. Zhang H, Colclough K, Gloyn AL, Pollin TI. Monogenic diabetes: a gateway to precision medicine in diabetes. J. Clin. Invest. 2021;131:e142244. doi: 10.1172/JCI142244. - DOI - PMC - PubMed
    1. Gerrard DT, et al. An integrative transcriptomic atlas of organogenesis in human embryos. eLife. 2016;5:e15657. doi: 10.7554/eLife.15657. - DOI - PMC - PubMed

Supplementary concepts