Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 16;16(1):6545.
doi: 10.1038/s41467-025-61734-w.

Identification of functional non-coding variants associated with orofacial cleft

Affiliations

Identification of functional non-coding variants associated with orofacial cleft

Priyanka Kumari et al. Nat Commun. .

Abstract

Oral facial cleft (OFC) comprises cleft lip with or without cleft palate (CL/P) or cleft palate only. Genome wide association studies (GWAS) of isolated OFC have identified common single nucleotide polymorphisms (SNPs) in many genomic loci where the presumed effector gene (for example, IRF6 in the 1q32 locus) is expressed in embryonic oral epithelium. To identify candidates for functional SNPs at eight such loci we conduct a massively parallel reporter assay in a fetal oral epithelial cell line, revealing SNPs with allele-specific effects on enhancer activity. We filter these SNPs against chromatin-mark evidence of enhancers and test a subset in traditional reporter assays, which support the candidacy of SNPs at loci containing FOXE1, IRF6, MAFB, TFAP2A, and TP63. For two SNPs near IRF6 and one near FOXE1, we engineer the genome of induced pluripotent stem cells, differentiate the cells into embryonic oral epithelium, and discover allele-specific effects on the levels of effector gene expression, and, in two cases, the binding affinity of transcription factors FOXE1 or ETS2. Conditional analyses of GWAS data suggest the two functional SNPs near IRF6 account for the majority of risk for CL/P at this locus. This study connects genetic variation associated with OFC to mechanisms of pathogenesis.

PubMed Disclaimer

Conflict of interest statement

Competing interests: L.P. is an employee of Cencora-PharmaLex. K.B. is co-founder of Matchstick Technologies, Inc., and co-inventor of PIXUL, used here for chromatin immunoprecipitation experiments. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. MPRA identified SNPs with allele-specific significant effects on reporter activity.
a Schematic of MPRA library construction and execution. BC, barcode. Yellow and red asterisks: non-risk and risk allele, respectively. b Scatter dot plot showing (black dots) 65 SNPs with significant allele-specific effects on reporter activity in the MPRA and (gray dots) 822 SNPs without them. Arrows indicate the functional SNPs identified in this study (two near IRF6, rs11119348 and rs661849, and one near FOXE1, rs10984103). Dashed lines indicate the 95th percentile of the reporter activity of scrambled elements; on both axes, zero, i.e., log2(1) represents the reporter activity of the empty vector.
Fig. 2
Fig. 2. Two SNPs (rs11119348 and rs661849) at IRF6 and one SNP (rs10984103) at FOXE1 as regulatory variants.
ac UCSC browser views of the human genome, GRCh37/hg19, focused on the OFC-associated SNPs at a, b IRF6 and c FOXE1. OFC SNPs, SNPs identified by GWAS and evaluated by MPRA at each locus. One SNP at the FOXE1 locus, which was tested in the MPRA but lacked significant effects in it, rs1877431, is outside of the range presented here. MPRA-sig SNPs, SNPs with allele-specific effects in the MPRA. IRF6 enhancers, FOXE1 enhancers, and enhancers for the indicated gene identified using the activity-by-contact (ABC) method and datasets from NHEK (see Methods). Gray arrows, GWAS-meta-analysis P values of the indicated SNPs. Two-sided P values are calculated with an inverse-variance weighted fixed-effects meta-analysis of a transmission disequilibrium test without adjustment for multiple testing and a logistic regression (that had 18 principal components of ancestry as covariates). GM12878 B-cell derived cell line, H1-ESC embryonic stem cells, K562 myelogenous leukemia cell line, HepG2 liver cancer cell line, HUVEC human umbilical vein endothelial cells, HMEC human mammary epithelial cells, HSMM human skeletal muscle myoblasts, NHEK normal human epidermal keratinocytes, NHLF normal human lung fibroblasts. CS 13-20, Carnegie stages for human embryonic face explants. Colored bars, inferred chromatin state from combinatorial analysis of multiple chromatin mark datasets. Orange and yellow, active and weak enhancer element, respectively; bright red, active promoter; light red, weak promoter; purple, inactive/poised promoter; blue, insulator; light green, weakly transcribed; gray, Polycomb repressed; light gray, heterochromatin/repetitive. Additional color bars in the facial explants dataset, green, transcription; green-yellow, transcription weak enhancer; purple, bivalent promoter; light purple, poised promoter.
Fig. 3
Fig. 3. Two SNPs at IRF6 and one SNP at FOXE1 have allele-specific effects on enhancer activities and expression levels of the adjacent OFC risk-associated genes.
a Bar chart of MPRA and luciferase reporter activities for the indicated SNPs at various loci in GMSM-K using elements of the indicated length centered on the SNP. MPRA data are represented as mean ± standard error of the mean (SEM) from the indicated number of replicates. Statistical significance (P value, two-tailed) of the difference between major and minor allele’s reporter activity is determined by Welch’s t-test, followed by Benjamini–Hochberg false discovery rate correction. P (MPRA) = 0.0452 (rs11110348), 0.0057 (rs661849), 0.9097 (rs642961), 0.6942 (rs4844939), 0.4487 (rs12104876), 0.0238 (rs75436877), 0.7100 (rs79482068), 0.0734 (rs79792381), 0.0125 (rs201265), 0.1206 (rs10124184), 0.0067 (rs10984103), 0.0458 (rs4812449), 0.2945 (rs57369620). For luciferase reporter activities, data are represented as mean ± standard deviation (SD) from four independent experiments. Statistical significance (P value, two-tailed) is determined by Student’s t-test. P (luciferase) = 0.0006 (rs11110348), <0.0001 (rs661849), 0.3822 (rs642961), <0.0001 (rs4844939), 0.14 (rs12104876), <0.0001 (rs75436877), 0.0693 (rs79482068), 0.5115 (rs79792381), 0.002 (rs201265), <0.0001 (rs10124184), <0.0001 (rs10984103), <0.0001 (rs4812449), 0.1421 (rs57369620). *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001, NS non-significant. bd Scattered dot plots of relative luciferase activity (using the longer elements described in Results) for non-risk and risk alleles of rs11119348, rs661849 and rs10984103 in primary neonatal keratinocytes (HEKn). Value of 1 is that of the empty pGL3 promoter vector. Data are represented as mean ± standard deviation (SD) from four independent experiments. Statistical significance (P value, two-tailed) is determined by Student’s t-test and P value is indicated on the plot. eg Scattered dot plot of relative levels of e, f IRF6 or g FOXE1 mRNA in edited induced oral epithelium (iOE) cells homozygous for the non-risk or risk alleles of each SNP, as indicated, assessed by qRT-PCR. Expression levels are normalized against those of ACTB, GAPDH, HPRT1, UBC and CDH1. Data were represented as mean ± SD from e, f six replicates or g nine replicates of cells harboring each genotype, as indicated in the plot. Each dot represents three technical qPCR replicates. Statistical significance (P value, two-tailed) is determined by Student’s t-test and P value is indicated on the plot. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Risk allele of IRF6 -10 kb SNP (rs11119348) promotes binding of transcriptional repressor FOXE1.
a Consensus FOXE1 binding motif from the JASPAR database of transcription factor DNA-binding preferences (Matrix ID: MA1847.1) and alignment of the variant site in several mammals. The risk allele (A) is the reference allele and has a higher frequency than the non-risk allele (C) in most populations. b, c Percent input identified by ChIP-qPCR for anti-FOXE1 and anti-H3K27Ac, respectively, in iOE cells heterozygous for rs11119348 using primers specific to the IRF6 -10 kb enhancer site or, as a negative control, to a region 103.7 kb upstream of the IRF6 transcription start site that lacked ATAC-Seq and H3K27Ac ChIP-Seq signals in HIOEC or NHEK. Error bars refer to three ChIP replicates and expressed as mean ± SD. Each dot represents three technical qPCR replicates. Statistical significance (P value, two-tailed) is determined by Student’s t-test and P value is indicated on the plot. NS non-significant. d Sequencing of anti-FOXE1 and anti-H3K27Ac ChIP-PCR products from cells heterozygous for rs11119348 using the indicated antibody. e Scattered dot plot of relative luciferase activity of the IRF6 -10 kb reporter construct (longest version) harboring the risk allele of rs11119348, in wildtype (WT) or a clone of FOXE1 homozygous knockout (KO) GMSM-K cells. Data are represented as mean ± SD from three independent experiments. Statistical significance (P value, two-tailed) is determined by Student’s t-test and P value is indicated on the plot. f Scattered dot plot of relative levels of IRF6 mRNA in WT and FOXE1-KO GMSM-K cells assessed by qRT-PCR. Expression levels of IRF6 are normalized against ACTB. Data are represented as mean ± SD from three replicates. Each dot represents three technical qPCR replicates. Statistical significance (P value, two-tailed) is determined by Student’s t-test and P value is indicated on the plot. g Model showing binding of FOXE1 to the IRF6 -10 kb enhancer, which is favored by the risk allele and which reduces IRF6 expression. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Risk allele of IRF6 −22 kb SNP (rs661849) promotes binding of transcriptional repressor ETS2.
a Consensus ETS2 binding motif from the JASPAR database of transcription factor DNA-binding preferences (Matrix ID: MA1484.1) and alignment of the variant site in several mammals. The risk allele (C) is the reference allele but has a lower frequency than the non-risk allele (T) in most populations. The risk allele improves the match to the ETS2 binding site which remains partial. b, c Percent input identified by ChIP-qPCR for anti-ETS2 and anti-H3K27Ac respectively in iOE cells heterozygous for rs661849 using primers specific to the IRF6 −22 kb enhancer site or, as a negative control, to a region 103.7 kb upstream IRF6 transcription start site lacking ATAC-Seq and H3K27Ac ChIP-Seq signals in HIOEC or NHEK. Error bars refer to three ChIP replicates and expressed as mean ± SD. Each dot represents three technical qPCR replicates. Statistical significance (P value, two-tailed) is determined by Student’s t-test and P value is indicated on the plot. NS non-significant. d Sequencing of anti-ETS2 and anti-H3K27Ac ChIP-PCR product of cells heterozygous for rs661849 using the indicated antibody. e Scattered dot plot of relative luciferase activity of the IRF6 −22 kb reporter construct (longest version) harboring the risk allele of rs661849 in control versus ETS2-depleted primary neonatal keratinocytes (HEKn). Data are represented as mean ± SD from three independent experiments. Statistical significance (P value, two-tailed) is determined by Student’s t-test and P value is indicated on the plot. f Scattered dot plot of relative levels of IRF6 mRNA in control versus ETS2-depleted HEKn assessed by qRT-PCR. Expression levels of IRF6 are normalized against ACTB. Data are represented as mean ± SD from three replicates. Each dot represents three technical qPCR replicates. Statistical significance (P value, two-tailed) is determined by Student’s t-test and P value is indicated on the plot. g Model showing binding of ETS2 to the IRF6 −22 kb enhancer, which is favored by the risk allele and which reduces IRF6 expression. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. The haplotypes containing rs11119348 and rs661849 together explain much of the CL association, and most of the CLP association, in Europeans at the 1q32/IRF6 locus.
ah Locus Zoom plots of ad cleft lip only and eh cleft lip and palate; a, e unconditioned; b, f conditioned on rs661849 (IRF6 −22 kb); c, g conditioned on rs1119348 (IRF6 −10 kb); d, h conditioned on both SNPs simultaneously. Two-sided P values are calculated with a logistic regression with sex without adjustment for multiple testing and ten principal components of ancestry as covariates (and the genotype of the SNP being conditioned on). Points are color-coded based on linkage disequilibrium (r2) in Europeans for each SNP with rs661849.

Update of

References

    1. Stanier, P. & Moore, G. E. Genetics of cleft lip and palate: syndromic genes contribute to the incidence of non-syndromic clefts. Hum. Mol. Genet.13, R73–R81 (2004). - PubMed
    1. Klotz, C. M. et al. Revisiting the recurrence risk of nonsyndromic cleft lip with or without cleft palate. Am. J. Med. Genet. Part A152, 2697–2702 (2010). - PMC - PubMed
    1. Sivertsen, Å. et al. Familial risk of oral clefts by morphological type and severity: population based cohort study of first degree relatives. BMJ336, 432–434 (2008). - PMC - PubMed
    1. Grosen, D. et al. A cohort study of recurrence patterns among more than 54 000 relatives of oral cleft cases in Denmark: support for the multifactorial threshold model of inheritance. J. Med. Genet.47, 162–168 (2010). - PMC - PubMed
    1. Nasreddine, G., El Hajj, J. & Ghassibe-Sabbagh, M. Orofacial clefts embryology, classification, epidemiology, and genetics. Mutat. Res. Rev. Mutat. Res.787, 108373 (2021). - PubMed

MeSH terms