Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2026 Jan 7:rs.3.rs-8429365.
doi: 10.21203/rs.3.rs-8429365/v1.

Global Evaluation of Congenital Heart Disease-Associated Non-Coding Variants

Affiliations

Global Evaluation of Congenital Heart Disease-Associated Non-Coding Variants

Edwin G Peña-Martínez et al. Res Sq. .

Abstract

Genome-wide association studies (GWAS) have mapped thousands of congenital heart disease (CHD)-associated variants within non-coding regions of the genome. Noncoding variants can alter regulatory mechanisms, such as transcription factor (TF) binding control of gene expression, potentially contributing human diseases. However, with the increasing number of disease-associated variants, comprehensive functional validation remains a significant challenge. In this work, we developed a novel method called SNP Bind-n-Seq to evaluate >3,000 CHD-risk variants for allelic binding for the cardiac TFs NKX2-5, GATA4, and TBX5 in a high-throughput manner. These binding affinity data sets were coupled with a massively parallel reporter assay (MPRA) to screen CHD-risk variant genotype-dependent regulatory activity. We identified 170 variants that exhibit allelic TF binding and 187 that modulate gene expression. Combining both approaches revealed three high-confidence variants with genotype-dependent TF binding, genotype-dependent transcriptional activity, and eQTL behavior in cardiac cells. Collectively, this study provides the first combined high-throughput biochemical and functional genomic evaluation of thousands of CHD-risk variants.

Keywords: DNA-binding; GWAS; MPRA; congenital heart disease; gene regulation; genotype-dependent biology; non-coding variants; transcription factors.

PubMed Disclaimer

Conflict of interest statement

Declaration of interest The authors declare no competing interests.

Figures

Figure 1:
Figure 1:
High-throughput evaluation of TF binding through SNP Bind-n-Seq. A) Overview of SNP Bind-n-Seq experimental approach and computational analysis. B) Sequence enrichment and binding affinity measurements for CHD-associated variants. Enrichment was calculated for all sequences at seven concentration points ranging from 0 nM to 3,000 nM for NKX2-5 (left), GATA4 (middle), and TBX5 (right). PWM logos generated from the top 200 sequences with the highest binding affinity (KA). C) Correlation between mean enrichment scores and fitted KA values. Enrichment at 3,000 nM is displayed for NKX2-5 (top), GATA4 (middle), and TBX5 (bottom).
Figure 2:
Figure 2:
CHD-associated variants exhibit allele-biased binding for cardiac TFs. A) Differential binding affinity analysis between reference and CHD-risk alleles. Variants with increased binding affinity (higher KA value for the alternate allele) are represented in red, while a decrease in binding affinity (higher KA value for the reference allele) is represented in blue. The solid gray line represents the Y = X intercept with a slope of 1. The dashed line has a 15° angle from the solid line and represents a 2-fold change in binding affinity between the reference and alternate allele. B) Venn diagram of CHD-risk variants with differential allelic binding. Overlaps between diagrams represent variants that altered DNA-binding for multiple TFs. C) In vitro validation of rs2465147 through EMSA for TBX5 (top) and NKX2-5 (bottom). Binding sites in the reference sequence are underlined in yellow for NKX2-5 and in blue for TBX5 in the alternate sequence. D) Allelic enrichment curve of rs863392 for TBX5 (top) and GATA4 (bottom). Reference alleles (Ref) are represented in blue, and tag-SNP alleles from the GWAS catalog (Alt 1) are represented in green. Permutated alleles (alternate non-risk; Alt 2 and Alt 3) are represented in red and purple, respectively. E) Heatmaps illustrating allele-specific PrOBEX fitted K-values for SNPs predicted to alter transcription factor binding of NKX2-5 (left), GATA4 (center), and TBX5 (right). Each row corresponds to an individual SNP (rsID), with columns representing the reference allele and all possible alternative nucleotides. “Alt1” denotes the observed alternative allele reported in the GWAS catalog, while “Alt2” and “Alt3” correspond to the remaining permutated alleles. Cell color intensity reflects the magnitude of the fitted K-value, with warmer colors indicating stronger predicted binding affinity. The upper and lower panels display SNPs associated with decreased and increased binding affinity, respectively. F) Distribution of TF binding motifs relative to the position of the SNP with allelic binding. Dots represent the number of motifs created or disrupted for NXK2-5 (yellow circles), GATA4 (red square), and TBX5 (blue triangle). The X-axis represents genomic coordinates, a 40 bp window in the SNP-Bind-n-Seq assay. The arrow represents the SNP location at X = 0. G) Nucleotide contribution of variants that directly create or disrupt TF binding motifs for NKX2-5 (left), GATA4 (middle), and TBX5 (right). The contribution of created motif counts is presented in red, and disrupted motifs are shown in blue. The motif used to scan variant contribution is displayed below the X-axis, where the value represents position within the motif. The bars in the plot are overlapping, not stacked.
Figure 3:
Figure 3:
Computational prediction of GWAS catalog variants on cardiac TF-DNA binding. A) Schematic of model training using SNP Bind-n-Seq binding data. B) Performance parameters of three predictive models trained on SNP Bind-n-Seq binding data. C) Number of SNPs (yellow) with traits (blue) from the GWAS catalog predicted to alter NKX2-5, GATA4, and TBX5 binding. D) Number of disease-associated SNPs divided by trait parent term predicted to alter NKX2-5, GATA4, and TBX5 binding. Dots in the figure represent NKX2-5 (yellow), GATA4 (red), and TBX5 (blue).
Figure 4:
Figure 4:
Regulatory activity of CHD-risk emVars. A) Distribution of MPRA regulatory activity. The normalized fold change of MPRA activity relative to plasmid control (X-axis) was calculated using DESeq2 (n = 3 biological replicates). Expression modulating alleles (emAlleles; dark blue) were identified as those alleles with significant activity relative to control (pFDR<0.05) and at least a 50% increase in activity. Full results are provided in Supplementary Data 3. B) Overlap between emVars and cardiac regulatory elements active during heart development and the adult heart. C) Enrichment of regulatory protein and TFs binding at emVars. Enrichments were calculated compared to non-emVars. p-values were estimated by a one-sided z-test with Bonferroni multiple testing correction using RELI. The top 15 regulatory proteins and TFs (based on RELI p-values) that overlap at least 10% of emVars are shown. Full results are provided in Supplementary Data 3. D) TF binding site motif enrichment for emVars compared to non-emVars. p-values were estimated by one-sided hypergeometric test with Benjamini–Hochberg multiple testing correction by HOMER using the full oligo sequences of emVars and non-emVars. The top 15 enriched TF motif families are shown. Full results are provided in Supplementary Data 3.
Figure 5:
Figure 5:
Regulatory activity and mechanisms of allelic emVars. A) Identification of variants with allelic CRE activity. Allelic CRE activity (Y-axis) is defined as the normalized fold change of MPRA activity between the non-reference and reference alleles (n = 3 biological replicates). MPRA activity (X-axis) is the normalized fold change of MPRA activity for any allele of the variant. Allelic emVars (red) were defined as variants with a significant difference in MPRA activity (pFDR<0.05) between any pair of alleles and at least a 25% change in activity difference compared to the reference allele. Full results are provided in Supplementary Data 4. B) Overlap between allelic emVars and cardiac regulatory elements active during heart development and the adult heart. C) Normalized MPRA CRE activity of each experimental replicate for rs559405101. D) Genome browser map of a 2 kb window centered on rs559405101. Binding sites for GATA4 and TBX5 are displayed as blue and red rectangles, respectively. rs559405101 is upstream of MYOM1 and is upregulated in the heart left ventricle and atrial appendage. E) Genotype-dependent TF binding events predicted for rs559405101. The X-axis indicates the preferred allele, along with a value indicating the strength of the allelic behavior (MARIO ARS value > 0.4), calculated as one minus the ratio of the weak to strong read counts (e.g., 0.5 indicates the strong allele has twice the reads of the weak allele). Significance (p-value < 0.05) was determined relative to binding events found in non-emVars sequences. Values in parentheses next to the TF name are the number of binding events created or disrupted by that specific TF. F) TF binding site location distribution for variant overlapping (blue) and variant adjacent (orange) TFs, relative to all allelic emVars. G-H) Motif enriched for TFs categorized as G) variant overlapping (Odds Ratio > 1.5, blue) and H) adjacent (Odds ratio < 1.5, orange) to the allelic emVars. Full results for figures 5F–H are provided in Supplementary Data 4.
Figure 6:
Figure 6:
CHD-risk variants with allelic binding and regulatory activity. A) Venn diagram of common variants with allelic TF binding (SNP Bind-n-Seq, blue) and transcriptional activity (MPRA, red). Five common variants are displayed with the TF that showed differential binding. Significance in the association between allelic binding and gene expression was determined by Fisher’s Exact Test (p-value < 0.005). B) Density plot correlating expression fold change of the allelic emVars with binding fold change of NKX2-5 (left), GATA4 (middle), and TBX5 (right). Variants with a 25 % change in both binding and expression are labeled on the plot. C) Representative binding curve for each TF for a variant that altered the binding of NKX2-5 (top), GATA4 (middle), and TBX5 (bottom). Reference alleles (Ref) are represented in blue, and tag-SNP alleles from the GWAS catalog (Alt 1) are represented in green. Permutated alleles (alternate non-risk; Alt 2 and Alt 3) are represented in red and purple, respectively. D) MPRA activity for the variants used for binding curves in (C). E) Interaction networks of cardiac eQTL genes with variants exhibit allelic binding and/or gene expression. Interactions highlighted in red indicate cardiac eQTL variants that exhibited allelic behavior in binding (SNP Bind-n-Seq) and expression (MPRA). Interactions highlighted in blue indicate eQTL variants for MGAT4C that altered binding for all three TFs. F) DNA-binding motif logos are shown for NKX2-5 in the context of the DNA sequence surrounding rs7303642. G) eQTL indicating MGAT4C expression dependent on rs7303642 genotype in the heart left ventricle (data from the GTEx portal).

References

    1. Pierpont M.E., Brueckner M., Chung W.K., Garg V., Lacro R. v., McGuire A.L., Mital S., Priest J.R., Pu W.T., Roberts A., et al. (2018). Genetic Basis for Congenital Heart Disease: Revisited: A Scientific Statement from the American Heart Association 10.1161/CIR.0000000000000606. - DOI
    1. Zaidi S., and Brueckner M. (2017). Genetics and Genomics of Congenital Heart Disease. Circ Res 120, 923–940. 10.1161/CIRCRESAHA.116.309140. - DOI - PMC - PubMed
    1. Zimmerman M.S., Smith A.G.C., Sable C.A., Echko M.M., Wilner L.B., Olsen H.E., Atalay H.T., Awasthi A., Bhutta Z.A., Boucher J.L.A., et al. (2020). Global, regional, and national burden of congenital heart disease, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet Child Adolesc Health 4, 185–200. 10.1016/S2352-4642(19)30402-X. - DOI - PMC - PubMed
    1. Dallapiccola B., Mingarelli R., Digilio M.C., Marino B., and Novelli G. (1994). Genetics of congenital heart diseases. G Ital Cardiol 24, 155–166. 10.5005/jp/books/12075_6. - DOI - PubMed
    1. Bruneau B.G. (2008). The developmental genetics of congenital heart disease. Nature 451, 943–948. 10.1038/nature06801. - DOI - PubMed

Publication types

LinkOut - more resources