Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb 23;15(1):1664.
doi: 10.1038/s41467-024-45506-6.

Complex regulatory networks influence pluripotent cell state transitions in human iPSCs

Collaborators, Affiliations

Complex regulatory networks influence pluripotent cell state transitions in human iPSCs

Timothy D Arthur et al. Nat Commun. .

Abstract

Stem cells exist in vitro in a spectrum of interconvertible pluripotent states. Analyzing hundreds of hiPSCs derived from different individuals, we show the proportions of these pluripotent states vary considerably across lines. We discover 13 gene network modules (GNMs) and 13 regulatory network modules (RNMs), which are highly correlated with each other suggesting that the coordinated co-accessibility of regulatory elements in the RNMs likely underlie the coordinated expression of genes in the GNMs. Epigenetic analyses reveal that regulatory networks underlying self-renewal and pluripotency are more complex than previously realized. Genetic analyses identify thousands of regulatory variants that overlapped predicted transcription factor binding sites and are associated with chromatin accessibility in the hiPSCs. We show that the master regulator of pluripotency, the NANOG-OCT4 Complex, and its associated network are significantly enriched for regulatory variants with large effects, suggesting that they play a role in the varying cellular proportions of pluripotency states between hiPSCs. Our work bins tens of thousands of regulatory elements in hiPSCs into discrete regulatory networks, shows that pluripotency and self-renewal processes have a surprising level of regulatory complexity, and suggests that genetic factors may contribute to cell state transitions in human iPSC lines.

PubMed Disclaimer

Conflict of interest statement

W.W.Y.G. and K.A.F. are co-founders of Synthalogy Therapeutics. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Community detection algorithm identifies hiPSC gene networks.
A Cartoon depicting our hypothesis that hiPSCs are composed of varying proportions of pluripotent cell state subpopulations. Created with BioRender.com. BE Boxplots of gene expression in the 15 hiPSCs with the highest (>80%) estimated proportions versus the 69 hiPSCs with the lowest (0%) estimated proportions of formative-state cells (Supplementary Fig. 2A). B DUSP5, (C) LEFTY1, (D) FZD5, and (E) FST. The boxplot maxima and minima are set by the samples with the highest and lowest expression of the corresponding gene, the center line represents the median gene expression, the upper and lower hinges correspond to the 25th and 75th percentiles, and the upper and lower whiskers extend to the highest and lowest value within 1.5 times the interquartile range (IQR). Source data are provided as a Source Data file. F UMAP plot displaying the expression of the 2946 Pareto genes (colored by GNM membership) across 213 hiPSC RNA-seq samples. Source data are provided as a Source Data file. G Heatmap showing that genes within a GNM are enriched for co-expression. Pairwise Fisher’s Exact tests were performed to validate that genes within the same GNM were more co-expressed with each other than with genes in other GNMs. Each cell is filled with the log2(Odds Ratio). For plot legibility, the enrichment range was set to −3.5 to 3.5. Source data are provided as a Source Data file. H Scatter plot showing the association between GNM 5 and the estimated proportion of formative-state cells across the 213 RNA-seq samples. To calculate the nominal P-value, a linear model was used to examine the association between the GNM 5 Score and the estimated relative formative population. Source data are provided as a Source Data file. I Scatter plot showing the association between GNM 10 and the estimated proportion of formative-state cells across the 213 RNA-seq samples. To calculate the nominal P-value, a linear model was used to examine the association between the GNM 10 Score and the estimated relative formative population. Source data are provided as a Source Data file. J Heatmap showing the enrichment of pluripotency cell state-associated genes in the 13 GNMs. A Fisher’s Exact Test was performed on seven published gene sets (Supplementary Data 5) associated with hiPSC cell states on each GNM to calculate the Odds Ratio. Published gene set labels on the y-axis were colored to indicate whether they were curated from in vivo (red) or in vitro (blue) experiments. The GNMs on the x-axis are ordered the same as in the above panel (G). Each cell is filled with the log2(Odds Ratio) for significant GNM cell state associations (red = enrichment, blue = depletion, white = non-significant). For plot legibility, the enrichment range was set to −3.5 to 3.5 and the “Primitive Endoderm” label represents the primitive endoderm-primed founder cells. Source data are provided as a Source Data file. K Network graph showing a subset of the co-expressed genes in GNM 5. L Network graph showing a subset of the co-expressed genes in GNM 10.
Fig. 2
Fig. 2. Community detection algorithm identifies hiPSC regulatory networks.
A Diagram of proposed molecular mechanisms underlying gene co-expression networks. Co-expression of genes located on different chromosomes but within the same GNM (A or B) is mediated via regulatory elements that are co-accessible because they bind the same transcription factors (TF1 or TF2). Created with BioRender.com. B Histogram displaying the number of peaks in each of the 13 major RNMs. C UMAP plot of 4772 ATAC-seq peaks in the 13 major RNMs. We plotted ATAC-seq peaks with the top 10% intramodular co-accessibility in each RNM, as opposed to the Pareto peaks (n = 9545) for plot legibility. Each point represents a peak colored by its corresponding RNM. Source data are provided as a Source Data file. D Heatmap showing the pairwise associations between 13 RNMs based on the co-accessibility enrichment. Each cell is filled with the log2(Odds Ratio) for the corresponding RNM pair. Source data are provided as a Source Data file. E Barplot showing the estimated cell state proportions across 150 ATAC-seq samples. Previously published ATAC-seq peaks from FACs sorted cells representing formative and primed pluripotency states were used to perform cellular deconvolution on the 150 iPSCORE hiPSC ATAC-seq samples. Each stacked barplot corresponds to an ATAC-seq sample and the colors correspond to the estimated formative and primed cell states. Source data are provided as a Source Data file. FH Boxplots demonstrating RNM 2 (F), 3 (G), and 8 (H) associations with the estimated proportion of cells in the formative state. A linear model was used to calculate the association between RNM scores (summed inversed normal transformed accessibility of RNM Pareto peaks) and the estimated formative proportion of 123 ATAC-seq samples (<100% estimated formative high proportion). Each point represents an ATAC-seq sample. Samples were binned by estimated proportion for boxplot legibility. The boxplot maxima and minima are set by the samples with the highest and lowest RNM score, the center line represents the RNM score, the upper and lower hinges correspond to the 25th and 75th percentiles, and the upper and lower whiskers extend to the highest and lowest value within 1.5 times the interquartile range (IQR). Source data are provided as a Source Data file. I Heatmap showing GNM-RNM associations. 32,327 ATAC-seq peaks in the 13 major RNMs were annotated with a putative target gene (see “Methods”). We performed pairwise Fisher’s Exact test to calculate enrichments (log2(Odds Ratio)) of RNM peaks in GNMs. Non-significant associations are filled in white. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. RNM functional characterization.
A Heatmap displaying the enrichment (Odds Ratio) of formative and primed-state associated peaks in each RNM calculated using Fisher’s Exact tests. For Fig. 3A–C, the following features are consistent; (1) the RNM order on the x-axis, (2) each cell is filled with the log2(Odds Ratio) for the corresponding RNM (red = enrichment, blue = depletion, white = non-significant), and (3) the enrichment range was set from −3 to 3 for plot legibility. Source data are provided as a Source Data file. B Heatmap showing enrichment (Odds Ratio) of the annotations for 5 collapsed iPSC chromatin states in each RNM calculated using Fisher’s Exact tests (Supplementary Fig. 6B). Source data are provided as a Source Data file. C Heatmap displaying enrichment (Odds Ratio) of 41 selected TF groups in each RNM calculated using Fisher’s Exact tests. A heatmap reporting the RNM enrichment for all 92 TF groups and “Not Bound” is shown in Supplementary Fig. 9. Source data are provided as a Source Data file. D Genome browser plot exhibiting the distribution of the three epigenetic annotations in the chr4:144,200,000:146,700,000 locus. The top track is the read depth of the merged BAM file of 34 reference samples (see “Methods”), the track below is the five collapsed chromatin states (colored by chromatin state membership) (Supplementary Fig. 6B), the following four tracks respectively show co-accessibility of hiPSC ATAC-seq peaks (colored by RNM membership), the formative-state ATAC-seq peaks reanalyzed from Lau et al, predicted binding sites for SMAD Family and SOX-LEF1 Complex, and transcript hg19 coordinates. E Detailed genome browser view highlighting the SMAD1 region of the chr4:146,400,000:146,415,000 locus. As indicated by the black box, the plot is a subset of the locus displayed in Fig. 3D, and has concordant track ordering.
Fig. 4
Fig. 4. RNM enrichments for fetal cell type-specific ATAC-seq peaks.
A Genome browser visualization showing (from the top): (1) hiPSC ATAC-seq peaks (colored by RNM); fetal cell type-specific peaks from the Descartes Atlas for (2) pancreatic islets, (3) trophoblast giant cells, (4) astrocytes, (5) vascular endothelial cells; TF footprints for (6) TEAD Family, and (7) NANOG-OCT4 Complex; and (8) transcript hg19 coordinates. B As indicated by the black box, the view is a section of Fig. 4A at higher resolution and demonstrates that an RNM 3 hiPSC ATAC-seq peak bound by NANOG-OCT4 and TEAD Family TFs overlaps a trophoblast giant cell-specific peak near an LIMD1 exon. C Heatmap displaying enrichment of the 13 RNMs in fetal cell type-specific peaks. Fisher’s Exact tests were used to calculate the enrichment (Odds Ratio) of the cell type-specific peaks for 54 fetal cell types in each RNM. 5 fetal cell type-specific peaks were omitted from the plot because they were not enriched an RNM. Each cell is filled with the log2(Odds Ratio) for the corresponding fetal cell types and RNM (red = enrichment, blue = depletion, white = non-significant). Source data are provided as a Source Data file. D Heatmap showing TF enrichment in the ATAC-seq peaks underlying enrichments in RNM 10. Fisher’s Exact tests were used to calculate the enrichment (Odds Ratio) of the 92 TF groups in the hiPSC peak underlying the fetal cell type associations. Each cell is filled with the log2(Odds Ratio) for the indicated fetal cell type and TF group (red = enrichment, blue = depletion, white = non-significant). Source data are provided as a Source Data file.
Fig. 5
Fig. 5. RNMs and TFBSs exhibit differential allelic imbalance.
A Density plot showing the allelic imbalance fraction (AIF) of ASCA SNPs in the 13 RNMs. The AIF densities (x-axis) for the 13 RNMs (y-axis) demonstrate that 4 RNMs (indicated by red asterisks) contain ASCA SNPs that have significantly greater allelic imbalance compared to ASCA SNPs in other RNMs (one-sided Mann-Whitney U Test, Benjamini-Hochberg adjusted P-value < 0.05). Source data are provided as a Source Data file. B Histogram showing the number of ASCA SNPs (y-axis) that overlapped predicted TFBSs (x-axis). 6323 SNPs in 4241 ATAC-seq peaks exhibited allele-specific chromatin accessibility (ASCA). ASCA SNPs were intersected with predicted TBFSs for all 92 TF groups. 4299 ASCA SNPs did not overlap a TFBS and 1933 overlapped one or more TFBSs. Source data are provided as a Source Data file. C Barplot demonstrating enrichment for ASCA SNPs in 23 TF groups. A two-tailed Fisher’s Exact test was used to calculate enrichment (Odds Ratio) for ASCA SNPs in the 92 TF groups (Supplementary Data 12). The barplot displays the enrichment (log2(Odds Ratio)) for the 23 TF groups that were significant (adjusted P-value < 0.05). The TF groups are ordered based on their AIF densities in Fig. 5D. Source data are provided as a Source Data file. D Density plot showing the allelic imbalance fraction (AIF) of ASCA SNPs in the 23 TF groups. The AIF densities (x-axis) for the 23 TF groups (y-axis) demonstrate that NANOG-OCT4 Complex binding sites (indicated by red asterisk) contain ASCA SNPs that have greater allelic imbalance compared to ASCA SNPs in the other 22 TF groups (one-sided Mann-Whitney U Test, Benjamini-Hochberg adjusted P-value < 0.05). Source data are provided as a Source Data file.

Update of

References

    1. Lau KX, et al. Unique properties of a subset of human pluripotent stem cells with high capacity for self-renewal. Nat. Commun. 2020;11:2420. doi: 10.1038/s41467-020-16214-8. - DOI - PMC - PubMed
    1. Hough SR, et al. Single-cell gene expression profiles define self-renewing, pluripotent, and lineage primed states of human pluripotent stem cells. Stem Cell Rep. 2014;2:881–895. doi: 10.1016/j.stemcr.2014.04.014. - DOI - PMC - PubMed
    1. Mazid MA, et al. Rolling back human pluripotent stem cells to an eight-cell embryo-like stage. Nature. 2022;605:315–324. doi: 10.1038/s41586-022-04625-0. - DOI - PubMed
    1. Cornacchia D, et al. Lipid deprivation induces a stable, naive-to-primed intermediate state of pluripotency in human PSCs. Cell Stem Cell. 2019;25:120–136.e10. doi: 10.1016/j.stem.2019.05.001. - DOI - PMC - PubMed
    1. Theunissen TW, et al. Systematic identification of culture conditions for induction and maintenance of naive human pluripotency. Cell Stem Cell. 2014;15:524–526. doi: 10.1016/j.stem.2014.09.003. - DOI - PMC - PubMed

MeSH terms