Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Oct 15:2025.10.15.680997.
doi: 10.1101/2025.10.15.680997.

Deep learning the dynamic regulatory sequence code of cardiac organoid differentiation

Affiliations

Deep learning the dynamic regulatory sequence code of cardiac organoid differentiation

Eyal Metzl-Raz et al. bioRxiv. .

Abstract

Defining the temporal gene regulatory programs that drive human organogenesis is essential for understanding the origins of congenital disease. We combined a time-resolved, single-cell multi-omic atlas of human iPSC-derived cardiac organoids with deep learning models that predict chromatin accessibility from DNA sequence, enabling the discovery of the regulatory syntax underlying early heart development. This framework uncovered cell-state-specific rules of cardiogenesis, including context-dependent activities of TEAD, HAND, and TBX transcription factor families, and linked these motifs to their target genes. We identified distinct programs guiding lineage divergence, such as ventricular versus pacemaker cardiomyocytes, and validated predictions by perturbing Myocardin (MYOCD), establishing its essential role in ventricular specification. Integration of chromatin, transcriptional, and genetic data further highlighted regulatory regions and disease-associated variants that perturb differentiation state transitions, supporting evidence that suggests congenital heart disease emerges early in development. This work bridges developmental gene regulation with disease genetics, providing a foundation for mechanistic and therapeutic insights into congenital diseases.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(A) Left: Experimental workflow. Right: Immunofluorescence staining at multiple differentiation time points. (B) scRNA and scATAC modalities from SHARE-seq multiome processing of ~60,000 cells throughout cardioid differentiation. (C) Cell type compositions in each sampled day, colors correspond to (B). (D) Manually curated gene markers verify numerous heart-relevant cell types. Colors correspond to (B), z-score normalized per gene. (E) kmeans clustering of consensus peaks across all cell types in all time points in the differentiating cardioid (~630K, left) and k-means clustering of all expressed TFs merged with the top 50 variable genes in each cluster (~2000 genes). Filled color boxes on top of heatmaps designate trajectories; clusters under the dashed lines are not part of the respective trajectory. (F). Known and predicted function of 502 ZNFs with cell-state specificity as defined in (E) Right. (G) Z-score normalized expression for 13 ZNFs with validated repressor domains.
Figure 2.
Figure 2.
(A) Azimuth projection of the cardioid scRNA data to a human gastrula dataset. Left - Mapping the cardioid cells to the human gastrula, UMAP shows overlapping cell states and labels from the human gastrula. Right - earlier cardioid time points’ cells are mapped to earlier human gastrula developmental stages. (B) The projection breakdown shows the mapping of individual cells to the reference. Less mature cell states in the cardioid map have higher Mapping and lower Prediction scores, while more mature cardioid cell states have higher Prediction scores and lower Mapping scores. (C) Optimal Transport Trajectory analysis identifies distinct cells predicted to differentiate to a more mature state (Black arrow). (D) Trajectory scores obtained through OT analysis within predefined trajectories (blue), as shown in Fig. 1D, are significantly higher than those outside these trajectories (grey). (E) The Optimal Transport Fate by differentiation day displays the fate probabilities for each cell, highlighting the early commitment of cells toward either of the final states (Late AE, vCMs, Hepatocytes). (F) Optimal Transport Fate. Expression of cell state markers, shown are the top 90% of cells expressing these markers.
Figure 3.
Figure 3.
(A) chromBPNet is a basepair-resolution deep learning model that predicts cell state-specific TF activity. (B) Observed (black) and predicted (before and after bias correction, top and bottom blue tracks, respectively) accessibility in the NKX2-5 locus for several cell types (left panel) shows increased accessibility and a high correlation with predicted accessibility in mature vCMs. The TSS and proximal promoter regions (highlighted, right panel) are predicted to have high contributions from KLF/SP, NFY, MEIS/TBX and SRF motifs. (C) Performing Motif abundance enrichment in peak sets (Fig. 1F) using chromBPNet predictions allows us to identify specific cell-state profiles of accessible motifs. (D) An allegedly canonical peak set (accessibility increases with differentiation in all lineages) has, in fact, cell-type-specific compositions of different motifs. Whereas TEAD has the same activity dynamics in all lineages in these peaks, other TFs show cell-state-specific activity. (E) Cell-state-specific motifs differentially regulate a DHRS3-related peak: Left - TEAD is active in both cell types, but TFAP and NKX2 motifs are cell-state-specific. Right - DHRS3 log-norm expression. (F) marginal footprinting for MEF2 motif (blue) in two lineages (upper versus lower panels) and corresponding expression dynamics (color lines). In the vCM lineage, MEF2 motif activity is explained by sequential expression patterns of the MEF2 genes. Although MEF2 is lowly expressed in the AE lineage, it is not predicted to have binding activity. (G) in silico, row-normalized cell-type-specific motif contribution (Marginal footprinting) shows various predicted motif activities throughout lineages. (H) The TEAD motif family in silico, row-normalized contributions, reveals differential predicted activity of TEAD variant motifs and TEAD composite motifs. Purple bar - TEAD motif; Black bar - partner motif.
Figure 4.
Figure 4.
(A) The gene-centric approach links active motifs in a cell-state-specific manner. Motifs in called peaks are filtered by consensus scE2G peak-gene linkages and are weighted down by a distance function before summing their contributions. (B) Average in genome contribution for the SRF regulome in the vCM lineage. (C) Average in genome contribution for the entire motif compendium across all cell states. The sums of contributions (intensity) for each motif family were averaged and scaled across all expressed genes (approximately 12,000 genes) in each cell state. The major cell-state specificity is indicated at the bottom, alongside motif examples (Fig. S4B for all annotations). (D) Comparison of differential genes (y-axis) versus differential accessibility (x-axis) between vCMs and pCMs progenitors. Marked genes represent the top differentials in either trajectory. (E) Differences in average contributions for each motif between vCM and pCM progenitors. Positive values indicate higher contributions in pCMs, while negative values indicate higher contributions in vCM progenitors. Error bars represent standard deviation (SD). (F-G) Motif syntaxes for LHX2 (F) and BMP2 (G), which are differentially expressed genes in pCMs and vCMs, respectively. Dashed boxes highlight vCM progenitors (top) and pCM progenitors (bottom), and the blue/orange annotations indicate the key regulators of the lineages shown in (E). Panels at the bottom display example peaks and motif contributions for each gene.
Figure 5.
Figure 5.
(A) MYOCD KD experimental scheme. (B) MYOCD KD images taken 12 hours after aggregation and on Day 10 of differentiation. On Day 0.5, both the sgRNA (BFP) and the dCAS9 (mCherry) are still visible. By Day 10 of differentiation, the control cardioid displays robust beating vCMs (MYL7-GFP), while the MYOCD KD predominantly lacks them. (C) RT-qPCR relative expression of MYOCD in control and MYOCD KD cardioids on days 5 and 8 indicates ~90% KD efficiency. (D) scRNA UMAPs show the depletion of specific cell types (vCMs and AE lineages), whereas other clusters are shared across conditions. (E) Differential prediction of cell types, achieved by projecting the cardioid scRNA data onto a mouse developmental atlas, reveals that alternative vCMs are more closely aligned with an SHF state than with the mature states of CMs. (F) Differential gene expression at Day 3 of cardioid differentiation between MYOCD KD and control (left panel) and the associated GO term enrichments (right panel). (G) Disease association of the differential genes determined by DISEASES enrichment analysis. (H) Motif enrichment in differential peaks following MYOCD KD. Motifs enriched in peaks with decreased accessibility show positive log(OR), and those in peaks with increased accessibility demonstrate negative log(OR) following MYOCD KD. Grey vertical bars highlight motif examples with high cell-state-specificity. (I) The raw sum of the contribution difference between MYOCD KD and control illustrates how the MYH7 regulatory syntax is impacted following MYOCD KD. The dashed box indicates no significant expression and no linked peaks via scE2G in specific cell states. Expression (Dark/light green for Control/MYOCD KD, respectively) was normalized to the iPSCs' basal expression levels. Red motif annotations correspond to motifs identified in the peak example (J). (J) Upstream MYH7-linked peak demonstrates motif-altered dynamics upon MYOCD KD.
Figure 6.
Figure 6.
CHD gene mapping. (A) CHD gene norm expression. Blue bars are the number of genes in the cluster. Kmeans clustered (n=17). Orange bars represent the number of unique linked peaks for each CHD gene cluster. (B) NOTCH1-linked peaks in selected cell states. For each cell state, shown are linked peaks in purple and the corresponding ATAC tracks. Orange highlights a cdCRE element zoomed in (D). (C) NOTCH1 motif contributions. Column normalized. (D) NOTCH1 linked cdCRE peak (highlighted in (B)) with highlighted accessibility-contributing motifs. (E) effect of single-basepair polymorphism on accessibility across cell states of the seven highlighted motifs in (D). (F) Top - TBX5 locus and known variants; Bottom - logFC effect on Accessibility and basepair contributions of the rs377307764 TBX5 G>C variant in Early Epiblast (no effect) and in Mesoderm (false discovery rate < 0.01, aaq > 0.05). Red arrow shows the variant location. (G) rs377307764 TBX5 G>C variant accessibility logFC change in the vCM lineage compared to control sequence (left axis). TBX5 expression (dashed line, right axis). rs377307764 variant in red.

References

    1. Xu Y., Zhang T., Zhou Q., Hu M., Qi Y., Xue Y., Nie Y., Wang L., Bao Z., and Shi W. (2023). A single-cell transcriptome atlas profiles early organogenesis in human embryos. Nature Cell Biology 25, 604–615. - PubMed
    1. Wang Z., Wu Z., Wang H., Feng R., Wang G., Li M., Wang S.-Y., Chen X., Su Y., Wang J., et al. (2023). An immune cell atlas reveals the dynamics of human macrophage specification during prenatal development. Cell 186, 4454–4471.e19. - PubMed
    1. Tyser R.C.V., Mahammadov E., Nakanoh S., Vallier L., Scialdone A., and Srinivas S. (2021). Single-cell transcriptomic characterization of a gastrulating human embryo. Nature 600, 285–289. - PMC - PubMed
    1. Bruneau B.G., Nemer G., Schmitt J.P., Charron F., Robitaille L., Caron S., Conner D.A., Gessler M., Nemer M., Seidman C.E., et al. (2001). A murine model of Holt-Oram syndrome defines roles of the T-box transcription factor Tbx5 in cardiogenesis and disease. Cell 106, 709–721. - PubMed
    1. Sugathan A., Biagioli M., Golzio C., Erdin S., Blumenthal I., Manavalan P., Ragavendran A., Brand H., Lucente D., Miles J., et al. (2014). CHD8 regulates neurodevelopmental pathways associated with autism spectrum disorder in neural progenitors. Proc Natl Acad Sci U S A 111, E4468–E4477. - PMC - PubMed

Publication types

LinkOut - more resources