Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar;56(3):420-430.
doi: 10.1038/s41588-024-01669-y. Epub 2024 Feb 20.

Functional dissection of human cardiac enhancers and noncoding de novo variants in congenital heart disease

Affiliations

Functional dissection of human cardiac enhancers and noncoding de novo variants in congenital heart disease

Feng Xiao et al. Nat Genet. 2024 Mar.

Abstract

Rare coding mutations cause ∼45% of congenital heart disease (CHD). Noncoding mutations that perturb cis-regulatory elements (CREs) likely contribute to the remaining cases, but their identification has been problematic. Using a lentiviral massively parallel reporter assay (lentiMPRA) in human induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs), we functionally evaluated 6,590 noncoding de novo variants (ncDNVs) prioritized from the whole-genome sequencing of 750 CHD trios. A total of 403 ncDNVs substantially affected cardiac CRE activity. A majority increased enhancer activity, often at regions with undetectable reference sequence activity. Of ten DNVs tested by introduction into their native genomic context, four altered the expression of neighboring genes and iPSC-CM transcriptional state. To prioritize future DNVs for functional testing, we used the MPRA data to develop a regression model, EpiCard. Analysis of an independent CHD cohort by EpiCard found enrichment of DNVs. Together, we developed a scalable system to measure the effect of ncDNVs on CRE activity and deployed it to systematically assess the contribution of ncDNVs to CHD.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing interests.

Figures

Extended Data Fig. 1 |
Extended Data Fig. 1 |. Establishment of the lentiMPRA platform to test cardiac enhancer activity in iPSC-CMs.
a. Strategy for pilot experiment to test lentiviral reporter assay in iPSC-CMs. b. Flow cytometry analysis of cTNT+ iPSC-CMs at differentiation day 12. Cells were gated with SSC and FSC to exclude debris and doublets. Flow cytometry plots displayed a biomodal distribution between fluorescent and non-fluorescent cells. Gates determining the percent of fluorescent cells were drawn at the local minimum between these distributions. c. Activities of PSC-specific enhancer (OCT4 PE) and cardiac enhancers (VISTA enhancer browser hs2330 and hs1670) in iPSCs and iPSC-CMs. Representative images from 4 independent experiments. Scale bar, 100 μm. d. Strategy for pilot experiment to measure enhancer activity by Amplicon-seq. e. Enhancer activities of PSC enhancers (Enh1–4) and cardiac enhancers (Enh 5–19). Activity of the empty vector (EV) was set 1. Enhancer activity was normalized to EV. Data are represented as mean ± SEM of 4 independent experiments (2-sided unpaired t test).
Extended Data Fig. 2 |
Extended Data Fig. 2 |. Assessment of human cardiac enhancer activity with hiPSC-CMs and lentiSTARR-seq.
a. Minimal read coverage of designed regions in DNA replicates. Red line shows minimum coverage for inclusion in analysis (FPM ≥ 20). b. Pearson correlation of MPRA activity between biological replicates at D17 and D24. There was excellent correlation both within group and across time points. c. Summary of MPRA results. Plot at the bottom shows a vertical line for each tested region with the indicated annotation. Enrichment score indicates enrichment of a set of regions of interest toward the ends of the ranked list of all regions. Enrichment p-value was determined by 1-sided permutation test (see Methods) with Bonferroni correction. Active enhancers were those enriched in RNA compared to DNA (DESeq2 Padj < 0.05). d. Violin plot with the log2(RNA/DNA) results of all candidates, active candidates, inactive candidates and negative controls. Kruskal-Wallis test p-values vs. neg control are shown. Center, box and whiskers indicate median, 25th and 75th percentiles and value closest to 25th percentile minus or 75th percentile plus 1.5 times the interquartile range. e. Twenty-four candidate cardiac enhancers of known cardiovascular disease genes with a range of MPRA enhancer activity were individually cloned into the lentiMPRA vector, in which a minimal promoter drives GFP expression. Red color indicates enhancers that were classified as active by MPRA. GFP expression was evaluated by epifluorescent imaging. Representative images from 4 independent experiments. Scale bar, 100 μm.
Extended Data Fig. 3 |
Extended Data Fig. 3 |. Functional dissection of active cardiac enhancers by tiling deletion mutagenesis.
a. Coverage of designed regions. Red line shows minimum coverage for inclusion in analysis (FPM ≥ 20). 97.6% of regions had coverage ≥20 FPM. b. Summary of activity of regions in the mutagenesis MPRA. Line plot at the bottom shows a vertical line for each tested region with the indicated annotation. Enrichment score indicates how the indicated annotations are distributed across the regions, ranked by activity. Enrichment p-value with Bonferroni correction was calculated using a 1-sided permutation test (see Methods). Active enhancers had barcodes that were overrepresented in RNA compared to DNA (DESeq Padj < 0.05). c. Validation of effects of mutations on transcription factor binding. Transcription factor binding was evaluated by electrophoretic mobility shift assay. The indicated wild-type and mutant oligonucleotide pairs were incubated with transcription factors with predicted altered motifs and analyzed by gel electrophoresis. Results are representative of at least 2 independent experiments.
Extended Data Fig. 4 |
Extended Data Fig. 4 |. CHD MPRA library characterization.
a. The CHD MPRA library included 6590 REF-ALT pairs. After pooled library synthesis of barcoded oligos, the oligos were PCR amplified and cloned into lentivirus genome backbone. A minimal promoter (miniP)-GFP cassette was then inserted into the cloned oligo library. b. Summary of activity of CHD MPRA library. Plot on bottom indicates the occurrence of the indicated annotation with a vertical line. Enrichment score represents enrichment of the indicated set of annotations at either end of the list of all regions, ranked by activity. Enrichment p-value was determined by 1-sided permutation test, with Bonferroni correction. Active enhancers had barcodes overrepresented in RNA compared to DNA (DESeq2 Padj < 0.05). c. Pearson correlation (PCC) between regions shared between the Mutagenesis MPRA and the CHD MPRA. The same genomic sequences had different barcodes in the two assays. d. Validation of the effect of variants on transcription factor binding. EMSA assay was used to test the binding of SRF or TBX20 to REF or ALT variant sequences. For the GLB1L3 CRE, ALT disrupted the SRF motif and reduced SRF binding in the EMSA assay. For the PIP4K2A CRE, ALT generated a TBX20 motif and increased TBX20 binding in the EMSA assay. Representative of three independent experiments. Two-tailed t-test. n = 3 per group. Graph shows mean ± SD.
Extended Data Fig. 5 |
Extended Data Fig. 5 |. Genomic loci of CHD-associated ncDNVs.
ad. WashU Epigenome Browser views of loci containing 4 ncDNVs. Promoter capture Hi-C and RNA-seq in iPSCs and iPSC-CMs from ref. 33, PMID 29988018. Genes dysregulated by DNVs are indicated in red. Green lines highlight 171 bp REF region with DNV in the center. eh. Sanger sequencing traces of genome edited iPSC lines.
Extended Data Fig. 6 |
Extended Data Fig. 6 |. Characterization of iPSC-CMs with knockin of CHD gene-associated noncoding DNVs.
a. BCOR downregulation in SMAD2 Het and KO iPSC-CMs. Gene expression was measured by RNA-seq. One-way ANOVA with Dunnett’s multiple comparison test versus WT. n = 3. b. Effect of ncDNVs on binding of transcription factors to CREs near CHD genes. 39 bp duplexes centered on ncDNVs neighboring 4 CHD genes were synthesized. Binding of purified, recombinant proteins to the REF or ALT sequence was measured by electrophoretic mobility shift assay (EMSA). SMAD2 and HIC2 bound CREs near BCOR and ACVRL1 more strongly for REF compared to ALT. In contrast, SRF and TBX20 bound CREs near ADAMTS6 and MYOCD more strongly for ALT compared to REF. Note lower free probe in MYOCD-ALT compared to REF. Results are representative of at least three independent experiments. Quantification of TBX20 EMSA: mean ± SD; n = 3; two-sided t-test. Graphs in a and b show mean ± SD.
Extended Data Fig. 7 |
Extended Data Fig. 7 |. snRNA-seq characterization of the impact of four ncDNVs that impact MPRA activity on iPSC differentiation to iPSC-CMs.
a. Expression of cardiac marker genes. Most nuclei contained cardiomyocyte marker genes. b. Two independent iPSC clones per ncDNV (ACVRL1, ADAMTS6, MYOCD) or knockin pools (BCOR) were separately differentiated into iPSC-CMs and then analyzed by multiplexed snRNA-seq. After clustering, UMAP plots of individual cells are shown separately for each independent differentiation. ce. Pseudo-bulk differential gene expression analysis. The number of differentially expressed genes for each independent replicate vs. wild type was analyzed from snRNA-seq data. Differentially expressed genes for the two replicates showed excellent overlap (c). Gene ontology terms enriched in differentially expressed genes shared between biological replicates for ACVRL1 ncDNV KI lines (d) or ADAMTS6 ncDNV KI lines (e). BH-corrected hypergeometric p-values. f. CHD genes differentially expressed in iPSC-CMs containing indicated ncDNV knockins compared to wild-type (WT). The selected CHD genes were mouse or human CHD genes (see Supplementary Data 5) that overlapped with genes differentially expressed in both replicates of any of the four introduced ncDNVs. BH-corrected P values were reported by Seurat FindMarkers function. g. Comparison of genes upregulated in BCOR ncDNV KI pool iPSC-CMs compared to BCOR genome occupancy in H1 hESCs (GSE104690). One-sided permutation test (10000 permutations).
Extended Data Fig. 8 |
Extended Data Fig. 8 |. snRNA-seq characterization of the impact of five ncDNVs that did not alter MPRA activity in iPSC-CMs.
Five ncDNVs that did not affect MPRA activity (MPRA-NC) and were knocked into WTC-11 iPSCs. a,b. Two independent knockin clones of ARMC4, DDX11, DTNA or PDE2A ncDNV, a SOX9 ncDNV knockin clone, a BCOR ncDNV knockin pool (positive control) and WTC-11 (two independent replicates) were differentiated into iPSC-CMs. On day 10, nuclei were analyzed by multiplexed snRNA-seq. Clustering identified 4 cell states (a) that express iPSC-CM markers (b). c. The distribution of iPSC-CMs among the 4 cell states was reproducible in biological replicate samples. d. Analysis of iPSC-CM state distribution by genotype. BCOR significantly expanded cluster 1 compared to WT (ANOVA with Dunnett’s test versus WT for each iPSC-CM state). The ncDNVs that did not affect MPRA activity had no significant effect on iPSC-CM state distribution.
Extended Data Fig. 9 |
Extended Data Fig. 9 |. Characterization of EpiCard scores.
a. Comparison of EpiCard, HeartENN and Enformer scores by MPRA region activity. Two-sided t-test. b. Correlation between EpiCard, HeartENN and Enformer scores expressed as Pearson coefficient (p-value) across 3745 ncDNVs with scores available. c,d. Comparison of functional scores for ncDNVs in an independent CHD cohort and non-CHD cohort, compared by 2-sided t-test with nominal p-values reported. c. All ncDNVs meeting prioritization criteria (see Fig. 3a). Right, subset of prioritized ncDNVs near HHE genes. ncDNVs (n = 6211 CHD and 10224 non-CHD). d. Subset of ncDNVs near HHE genes (n = 3120 CHD and 5195 non-CHD). DNVs.Center, box and whiskers indicate median, 25th and 75th percentiles and value closest to 25th percentile minus or 75th percentile plus 1.5 times the interquartile range.
Extended Data Fig. 10 |
Extended Data Fig. 10 |. Schematic of enrichment score calculation.
Given a ranked list L and a specific group of regions R that is a subset of L, the enrichment score at position i (ESi) is the difference between the cumulative probability of membership in R compared to L.
Fig. 1 |
Fig. 1 |. Assessment of human cardiac enhancer activity with hiPSC-CMs and lentiSTARR-seq.
a, Experimental design of lentiSTARR-seq of candidate cardiac enhancers in iPSC-CMs. b, Coverage of designed regions. Red line shows minimum coverage in amplicons from genomic DNA for inclusion in analysis (FPM ≥ 20). c, Summary of lentiSTARR-seq results. Top plot shows the enhancer activity of each region, as a function of activity rank. Active enhancers—enhancers enriched in RNA compared to DNA (DESeq2 (ref. 37) Padj < 0.05)—are colored red. Bottom line plot shows a vertical line, colored by count density, for each tested region with the indicated annotation. Enrichment significance was determined by one-way permutation test with Bonferroni correction (Methods). d,e, LentiSTARR-seq validation. Seventeen active and seven inactive enhancers neighboring cardiovascular disease genes were cloned individually into the lentiMPRA vector. iPSC-CMs were transduced on day 17 and assayed on day 24. d, GFP fluorescence of the empty vector control and enhancer-reporter lentiviruses was measured by flow cytometry. The numbers above bars show a number of independent biological replicates. Numbers at the top show one-sided t-test for activity above empty vector with Benjamini–Hochberg multiple testing correction. Blue indicates Padj < 0.05. e, Correlation of enhancer activity measured by GFP MFI (sample sizes shown in d) and by MPRA (n = 4 biological replicates). Black line shows the best fit linear regression line and 95% confidence interval. f, Functional validation of two enhancers neighboring COL5A1 and TGFBR1 using CRISPRi with KRAB and LSD1. NT, nontargeting control gRNA. n = 4. Two-sided t-test with Bonferroni correction. g,h, Motif analysis of active candidate enhancers using genomic background (g) or inactive enhancers as background (h). The active enhancers were the union of the candidate regions active in the day 17 and day 24 experiments (n = 1,185). Motif enrichment P value was calculated by Homer using a binomial distribution and Benjamini–Hochberg correction (Q value). For complete motif analysis results, see Supplementary Data 1. Data are shown as mean ± s.d. MFI, mean fluorescence intensity.
Fig. 2 |
Fig. 2 |. Tiling deletion analysis of human cardiac enhancers.
Systematic tiling mutagenesis was performed on 123 active cardiac enhancers using the lentiMPRA/iPSC-CM platform. a, Design of mutagenesis MPRA. Each original 400 bp enhancer was divided into three 171 bp subregions (F1–F3), and each subregion was tiled with 10 bp deletions. The barcoded oligos were inserted into a lentiMPRA vector so that the barcode was in the reporter gene’s 3′ UTR. b, Reproducibility of mutagenesis MPRA. Four independent replicates were obtained on iPSC-CM culture day 24. Replicate samples were highly correlated. Pearson correlation is shown. c, Summary of activity of wild-type enhancer subregions. Each line represents the three subregions of an active 400 bp enhancer. d, Summary of mutagenesis MPRA results. Dashed diagonal lines indicate 50% fold change (FC) thresholds. Each point represents one wild-type–mutant (WT–Mut) pair. Sequences in which the members of the pair had different activity (FC ≥ 1.5, Padj < 0.05, at least one member of pair active) are colored. MPRA-DA, MPRA-IA and MPRA-NS indicate that the mutation decreased, increased or did not change MPRA activity, respectively, compared to WT. Padj was calculated using two-way paired t-tests with Benjamini–Hochberg correction. e, Representative example of mutagenesis data for the GBE1 and COL5A1 enhancers. Activity of a wild-type sequence and the median of its mutant counterpart are shown by dashed blue and orange lines, respectively. Larger circles indicate sequences with detectable activity. Colors indicate a significant change of activity in the mutant sequence compared to the wild-type pair. Transcription factor motifs created or ablated by mutation are shown in magenta and blue, respectively. f, Summary motif analysis of tiling mutagenesis. Each point represents one motif family and one WT–Mut pair. Motif significance scores are nominal P values reported by FIMO. Colored points indicate that a Mut sequence lost or gained a motif compared to its wild-type counterpart. The size of each point represents the odds ratio that the motif was changed compared to MPRA-NS. Complete table of results can be found in Supplementary Data 2. g, Top transcription factor motifs, ranked by frequency.
Fig. 3 |
Fig. 3 |. Dissection of CHD ncDNV impact on cardiac enhancer activity.
a, CHD ncDNV prioritization. The 6,590 prioritized ncDNVs were each synthesized as a REF and ALT pair of 230 nt oligos, in which a 171 bp genomic region was centered on the ncDNV. Barcoded oligos were cloned into the lentiMPRA as depicted for the mutagenesis MPRA in Fig. 2a. b, Histogram showing coverage of designed regions. Red line shows minimum coverage in amplicons from genomic DNA for inclusion in analysis (FPM ≥ 20). c, Reproducibility of CHD lentiMPRA. Four independent replicates were obtained on iPSC-CM culture day 24. There was high correlation (Pearson r > 0.86) between replicates. d, Summary of CHD MPRA results. Dashed diagonal lines indicate 50% fold change thresholds. Each point represents a ncDNV’s REF–ALT pair. Colored points indicated differential activity between REF and ALT (two-way paired t-test with BH correction <0.05; |log2(FC)| >0.58; active in at least one replicate). e, Effect of MPRA-DA and MPRA-IA ncDNVs on enhancer activity. In MPRA-DA regions, REF exhibited enhancer activity and overall ALT had negligible activity. In MPRA-IA regions, REF had negligible activity and ALT had enhancer activity comparable to REF in MPRA-DA regions. Dotted lines indicate the 5th and 95th percentile values of active enhancers. Statistical comparison by ANOVA with the Tukey post hoc test is shown above the plot. Numbers at the bottom of the plot indicate number of regions in each group. Center, box and whiskers indicate median, 25th and 75th percentiles and value closest to 25th percentile minus or 75th percentile plus 1.5 times the interquartile range. f, Effect of ncDNVs on transcription factor motifs in MPRA-DA and MPRA-IA regions. See Fig. 2f for details. Complete table of results can be found in Supplementary Data 3. g, Top transcription factor motifs impacted by ncDNVs, ranked by frequency.
Fig. 4 |
Fig. 4 |. Characterization of CHD gene-associated ncDNVs in iPSC-CMs.
a, Schematic representation of characterization of CHD ncDNVs in iPSC-CMs. CRISPR–Cas9 was used to introduce ncDNVs from CHD lentiMPRA into their endogenous loci. After isolation of clonal lines and differentiation to iPSC-CMs, the expression of neighboring genes was measured by RT–qPCR. be, Validated ncDNVs and their impact on neighboring CHD-associated genes. Bar plots show RT–qPCR analysis of genes adjacent to DNVs near BCOR (b), MYOCD (c), ADAMTS6 (d), and ACVRL1 (e) in day 17 iPSC-CMs. Data are shown as mean ± s.d. of at least three independent experiments. ANOVA with Dunnett’s test compared to wild-type (WT) control. REF and ALT sequences are shown to the right, with the SNV highlighted in yellow, along with predicted transcription factor binding motifs impacted by SNVs. BCOR knock-in lines, n = 3. All others, n = 4. f,g, UMAP projection of wild-type and ncDNV knock-in nuclei from iPSC-CMs at day 10 of differentiation. f, iPSC-CM clusters are colored and numbered 0–8. g, Nuclei are colored by genotype. Merge (top-left) shows all genotypes with indicated colors and the remaining panels each show one genotype. h, Stacked bar graph of the percentages of nuclei in each cluster. The proportion of nuclei in each cluster was compared to wild-type nuclei; numbers indicate significant P values (one-way ANOVA with Dunnett’s multiple comparison test). i, Heatmap of genes that were differentially expressed in BCOR or MYOCD ncDNV knock-ins compared to wild type. Genes that were significantly different from wild-type in both replicates (Seurat FindMarkers Padj < 0.05; Methods) were selected. Heatmap displays the scaled average gene expression from each replicate. j,k. Gene Ontology analysis of the genes differentially expressed in both MYOCD ncDNV homozygous and heterozygous lines (j) or in both BCOR ncDNV lines (k) compared to wild-type iPSC-CMs. Hypergeometric test with Bonferroni correction for multiple testing. In be, ****P < 0.0001.
Fig. 5 |
Fig. 5 |. Development of ‘EpiCard’ noncoding functional score based on lentiMPRA enhancer activity measurements.
a,b, Univariate correlations of MPRA activity with genomic annotations. Each bar is labeled with nominal Pearson P values. We did not detect a significant correlation with fetal cardiac histone marks (a). The maximum Enformer score and Segway noncoding functional score trained on fetal heart data were weakly correlated, whereas the minimum Enformer score was weakly anticorrelated (b). c, The EpiCard score used activity data from reference MPRA regions to train a model using epigenetic annotations The EpiCard score was then applied to ncDNVs, and values were compared between a CHD and non-CHD cohort. d, LASSO regression models trained on all data (left) or only the active MPRA regions (right) generated modest correlations. e, A binary model generated EpiCard scores that separated active and inactive regions. Green dotted line indicates the 95th percentile cutoff for inactive enhancers; active MPRA regions above that region were enriched (OR = 11.1; Fisher’s test, P < 2.2 × 10−16). f, EpiCard scores of ncDNVs identified in 1,062 independent CHD trios. Left, all ncDNVs meeting prioritization criteria (Fig. 3a). Right, subset of prioritized ncDNVs near HHE genes. EpiCard scores were significantly higher in CHD participants compared to non-CHD participants (two-sided t-test). Numbers by violins indicate number of ncDNVs in each group. Center and box indicate the median, 25th and 75th percentiles, respectively.

References

    1. Van der Linde D et al. Birth prevalence of congenital heart disease worldwide: a systematic review and meta-analysis. J. Am. Coll. Cardiol. 58, 2241–2247 (2011). - PubMed
    1. Zaidi S et al. De novo mutations in histone-modifying genes in congenital heart disease. Nature 498, 220–223 (2013). - PMC - PubMed
    1. Homsy J et al. De novo mutations in congenital heart disease with neurodevelopmental and other congenital anomalies. Science 350, 1262–1266 (2015). - PMC - PubMed
    1. Jin SC et al. Contribution of rare inherited and de novo variants in 2,871 congenital heart disease probands. Nat. Genet. 49, 1593–1601 (2017). - PMC - PubMed
    1. ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012). - PMC - PubMed

LinkOut - more resources