Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 21;21(4):e1012962.
doi: 10.1371/journal.pcbi.1012962. eCollection 2025 Apr.

Identifying reproducible transcription regulator coexpression patterns with single cell transcriptomics

Affiliations

Identifying reproducible transcription regulator coexpression patterns with single cell transcriptomics

Alexander Morin et al. PLoS Comput Biol. .

Abstract

The proliferation of single cell transcriptomics has potentiated our ability to unveil patterns that reflect dynamic cellular processes such as the regulation of gene transcription. In this study, we leverage a broad collection of single cell RNA-seq data to identify the gene partners whose expression is most coordinated with each human and mouse transcription regulator (TR). We assembled 120 human and 103 mouse scRNA-seq datasets from the literature (>28 million cells), constructing a single cell coexpression network for each. We aimed to understand the consistency of TR coexpression profiles across a broad sampling of biological contexts, rather than examine the preservation of context-specific signals. Our workflow therefore explicitly prioritizes the patterns that are most reproducible across cell types. Towards this goal, we characterize the similarity of each TR's coexpression within and across species. We create single cell coexpression rankings for each TR, demonstrating that this aggregated information recovers literature curated targets on par with ChIP-seq data. We then combine the coexpression and ChIP-seq information to identify candidate regulatory interactions supported across methods and species. Finally, we highlight interactions for the important neural TR ASCL1 to demonstrate how our compiled information can be adopted for community use.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Overview of study design.
(A) Counts of datasets by source, technology, and species. (B) Top panel: Counts of cells across the dataset corpus. Bottom panel: Counts of cell types. (C) Schematic of the single cell coexpression aggregation framework and the interpretation of an individual gene coexpression profile. (D, E) Examples of the most reproducible positively coexpressed gene pairs. Each bar represents a dataset/network, and each point represents the gene pair’s correlation in a cell type within the dataset. (F) Example of one of the most reproducibly negative coexpression gene pairs.
Fig 2
Fig 2. Similarity of TR profiles.
(A) Inset: distribution of the mean Top200 overlaps for the null background, 82 ribosomal genes, and 1,605 human TRs. The null was generated through 1000 iterations of sampling one TR profile from each of 120 human datasets and calculating the average size of the Top200 overlap between every pair of sampled profiles. The ribosomal genes represent a “base case” scenario. Main: The average Top200 overlap of all human TRs, with the red line indicating the best null overlap. (B) Same as in A, save for 103 mouse experiments and 1,484 TRs. (C,D) Saturation analysis of global TR profiles for human (C) E2F8 and (D) PAX6. Left panels show the spread of Top200 overlaps between individual dataset profiles and the global E2F8 and PAX6 profiles. Right panels show the spread of overlaps when iteratively subsampling and aggregating datasets at increasing steps. Dotted lines indicate the average number of sampled datasets required to reach 80% of the global profile. E2F8 recovers its global profile with relatively fewer datasets than does PAX6.
Fig 3
Fig 3. Recovery of literature curated targets by aggregate rankings.
(A) Schematic of literature curation evaluation. (B) Distributions of the observed AUROCs for 451 human and 434 mouse aggregate TR coexpression profiles, along with the distribution of the median null AUROCs generated for each profile. (C) Histograms of the AUROC and AUPRC coexpression quantiles for human and mouse. (D) Scatter plot of the AUROC quantiles for the coexpression and binding profiles of 253 human TRs that had binding data and at least five curated targets. Green box indicates TRs for which both genomic methods were effective in the benchmark, grey box for only one method, and red box for neither method being effective.
Fig 4
Fig 4. Preservation of mouse and human single cell coexpression profiles.
(A) Distribution of coexpression agreement between the aggregate single cell coexpression profiles of 1,246 orthologous TRs. Black lines indicate the median value for the TRs, grey lines indicate the median of null values generated by shuffling pairs of orthologous TRs. (B) Top: Schematic of the ortholog retrieval workflow, adapted from Suresh et al., 2023 [17]. Bottom: Scatterplot of the resulting ortholog retrieval scores (C) Scatter plot of the ASCL1 Top200 overlaps. (D) The top 15 GO terms when combining the human and mouse top ASCL1 coexpressed gene partners.
Fig 5
Fig 5. Count of interactions supported across methods and species.
(A) Inset: criteria used to group interactions into tiers. Bar chart: Count of unique interactions gained in each orthologous tier (Stringent, Elevated, and Mixed-Species) for the 216 TRs with binding data in both species. (B) Count of Species-Specific interactions for 317 TRs in human (top) and 305 TRs in mouse (bottom). TRs are split by those with ChIP-seq data in one species only (left) and thus are ineligible for consideration in the orthologous interactions, and those with ChIP-seq data in both species (right). Grey bars indicate the count of interactions already found in the Stringent and Elevated sets, coloured bars indicate the count of Species-Specific interactions that were gained due to lacking orthologs or because they had elevated ChIP-seq signal in one species and not the other.
Fig 6
Fig 6. Reproducible ASCL1 interactions.
(A) Heatmap representing the tiered evidence for ASCL1 candidate targets. (B, C) Distribution of Pearson’s correlations for ASCL1-DLL3 in (B) human and (C) mouse, as in Fig 1E–G. (D, E) Scatterplot of the CPM values for ASCL1 and DLL3 for the cells belonging to the cell type that had the highest correlation in the entire corpus for (D) human and (E) mouse. (F, G) Genome track plots centered on DLL3 (yellow boxes) in (F) human and (G) mouse, where the base of the red bars indicates ASCL1 binding regions, and the height indicates the count of ASCL1 ChIP-seq datasets with a peak in the region.

Update of

Similar articles

References

    1. Crow M, Paul A, Ballouz S, Huang ZJ, Gillis J. Exploiting single-cell expression to characterize co-expression replicability. Genome Biol. 2016;17:101. doi: 10.1186/s13059-016-0964-6 - DOI - PMC - PubMed
    1. Heumos L, Schaar AC, Lance C, Litinetskaya A, Drost F, Zappia L, et al.. Best practices for single-cell analysis across modalities. Nat Rev Genet. 2023;24(8):550–72. doi: 10.1038/s41576-023-00586-w - DOI - PMC - PubMed
    1. Lambert SA, Jolma A, Campitelli LF, Das PK, Yin Y, Albu M, et al.. The human transcription factors. Cell. 2018;172(4):650–65. - PubMed
    1. Rothenberg EV. Causal gene regulatory network modeling and genomics: second-generation challenges. J Comput Biol. 2019;26(7):703–18. doi: 10.1089/cmb.2019.0098 - DOI - PMC - PubMed
    1. Sonawane AR, Weiss ST, Glass K, Sharma A. Network medicine in the age of biomedical big data. Front Genetics. 2019;10:294. doi: 10.3389/fgene.2019.00294 - DOI - PMC - PubMed

Substances

LinkOut - more resources