Identification of genomic enhancers through spatial integration of single-cell transcriptomics and epigenomics
- PMID: 32431014
- PMCID: PMC7237818
- DOI: 10.15252/msb.20209438
Identification of genomic enhancers through spatial integration of single-cell transcriptomics and epigenomics
Abstract
Single-cell technologies allow measuring chromatin accessibility and gene expression in each cell, but jointly utilizing both layers to map bona fide gene regulatory networks and enhancers remains challenging. Here, we generate independent single-cell RNA-seq and single-cell ATAC-seq atlases of the Drosophila eye-antennal disc and spatially integrate the data into a virtual latent space that mimics the organization of the 2D tissue using ScoMAP (Single-Cell Omics Mapping into spatial Axes using Pseudotime ordering). To validate spatially predicted enhancers, we use a large collection of enhancer-reporter lines and identify ~ 85% of enhancers in which chromatin accessibility and enhancer activity are coupled. Next, we infer enhancer-to-gene relationships in the virtual space, finding that genes are mostly regulated by multiple, often redundant, enhancers. Exploiting cell type-specific enhancers, we deconvolute cell type-specific effects of bulk-derived chromatin accessibility QTLs. Finally, we discover that Prospero drives neuronal differentiation through the binding of a GGG motif. In summary, we provide a comprehensive spatial characterization of gene regulation in a 2D tissue.
Keywords: enhancer detection; eye-antennal disc; gene regulation; single-cell omics; spatial integration.
© 2020 The Authors. Published under the terms of the CC BY 4.0 license.
Conflict of interest statement
The authors declare that they have no conflict of interest.
Figures
Experimental approach. scRNA‐seq was performed in eye‐antennal discs using 10x Genomics, resulting in a data set with 3,531 high‐quality cells. Main spatial compartments in the eye‐antennal disc are annotated.
tSNE representation of the scRNA‐seq data (with 3,531 cells).
tSNE colored by the standardized gene expression of known cell type markers in the eye‐antennal disc. In each plot, three marker genes are shown, using RGB encoding.
tSNE annotated by label transfer with Seurat v3 (Stuart et al, 2019) using the scRNA‐seq eye disc data set from Ariss et al (2018).
Cell‐to‐regulon heatmap showing the standardized enrichment or area under the curve (AUC) from SCENIC (Aibar et al, 2017) for each selected regulon based on RSS in each cell. Top enriched motifs for representative regulons are shown below. Regulons marked with * are based on ChIP‐seq track enrichment.
Experimental set up. 15,387 nuclei were profiled using 10x scATAC‐seq.
Correlation between the accessibility of regions in the bulk ATAC‐seq and the aggregated single‐cell profiles.
Comparison of bulk ATAC profiles, scATAC‐seq aggregate with 10x and 400 individual cells, where each row represents a cell and fragments are shown in black. The number of cells in the aggregate is indicated between brackets.
cisTopic cell tSNE (15,387 nuclei) colored by annotated cell type.
Topic‐cell enrichment heatmap with selected topics.
Aggregate profiles per cell type in the top region of the indicated topic.
Topic modeling recapitulates the dynamic chromatin changes during differentiation in the eye disc. Top: cisTopic cell tSNE colored by topic enrichment. Middle: cisTopic region tSNE colored by topic enrichment. Bottom: Selected enriched motifs in each topic.
Bulk ATAC was performed on Optix‐GFP+ and Optix‐GFP− FACS sorted cells (based on the activity of the Optix2/3 enhancer). cisTopic cell tSNE and region tSNE are colored based on the enrichment of regions that are differentially accessible between Optix‐GFP+ and Optix‐GFP−. Motifs enriched in the regions differentially accessible in Optix‐GFP+ cells are shown. Scale bar, 100 μm.
Computational approach for mapping single‐cell RNA or single‐cell ATAC‐seq data into the virtual eye‐antennal disc. Briefly, cells are ordered by pseudotime, corresponding to the proximal‐distal axis in the antennal disc and the anterior–posterior axis in the eye disc. For each cluster, real and virtual cells are divided into the same number bins based on pseudotime and axis position, respectively. Finally, cells are mapped into the virtual cells in the matching bin.
Gene expression correspondence between the Seurat tSNE and the virtual eye. The expression of three genes is shown per plot, using RGB encoding.
Correspondence between region accessibility and activity for 12 Janelia‐Gal4 enhancers. Top row: cisTopic cell tSNE colored by the accessibility probability of each region in each cell. Middle row: Virtual eye colored by the accessibility probability of each region in each cell. Bottom row: Confocal images showing the activity (GFP, green) of each region in eye‐antennal discs. Scale bar, 100 μm.
Discordance between region accessibility and activity for 2 Janelia‐Gal4 enhancers. Top row: cisTopic cell tSNE colored by the probability of each region in each cell. Middle row: Virtual eye colored by the probability of each region in each cell. Bottom row: Confocal images showing the activity (GFP, green) of each region in eye‐antennal discs. Scale bar, 100 μm.
Relationship between the correlation between the accessibility and the activity of the regions and their distribution (as gini score). Below, representative motifs enriched in generally and specifically regions, with low (< 0.2) and high (> 0.4) gini score, respectively, are shown.
Model describing the two classes of enhancers found. On the one hand, some enhancers (such as Grh targets) are generally accessible, but only become functional with a specific co‐factor(s) binds; on the other hand, for other enhancers, accessibility is more specific and is couples with activity (based on the binding of one or more TFs). Histograms shown the average topic score for enhancers of both classes are shown.
Computational approach for linking enhancer to target genes.
Inferred links for senseless. The promoter of the gene is highlighted in grey, and the validated enhancer sens‐F2 is highlighted in blue.
Inferred links for dacshund. The promoter of the gene is highlighted in grey, and the validated enhancers 3EE and 5EE are highlighted in blue.
Inferred links for glass. The promoter of the gene is highlighted in grey, the validated enhancers by Fritsch et al (2019) are highlighted in blue, and the glass gene is highlighted in red.
Number of enhancer‐to‐gene links per gene.
Number of links with genes in the ranked position based on distance from the enhancer.
Number of positive and negative links, with representative enriched motifs in each category with Normalized Enrichment Score (NES).
Link‐based regulon for Atonal built using GRNBoost co‐expression modules and motif enrichment on the regions linked to each potential target gene. Left: Cytoscape view of the link‐based regulons. Color scale indicates the average importance of the regions enriched in the transcription factor motif for each gene. Known targets of Atonal (Aerts et al, 2010) are highlighted in black and grey and with a bigger font. Middle: Examples of target genes, showing the enhancer‐to‐region links (top), cisTarget regions (middle), and gene annotation. cisTarget regions in which the motif for the transcription factor is enriched are shown in red. The area highlighted in yellow corresponds to the motif enrichment search space used in SCENIC (Aibar et al, 2017). Right: GSEA (Gene Set Enrichment Analysis) plots comparing the link‐based regulons with differentially expressed genes in both gain and loss of function mutants described in Aerts et al (2010).
Approach for the identification of genome‐wide caQTLs using bulk ATAC‐seq profiles of 50 inbred Drosophila melanogaster lines. Briefly, after identifying the SNPs among the lines, a generalized linear model (GLM) is used to assess whether the presence of the SNP has an effect in chromatin accessibility. Once these caQTLs are identified, we estimate the effect they have on transcription factor binding sites by comparing the motif score with the reference and alternative SNP (i.e., delta score). A positive delta score indicates that the presence of the motif is related to chromatin opening, while a negative delta score reflects that the motif cause chromatin closeness.
Bulk chromatin profiles of the 50 inbred lines. While 21 of these ATAC‐seq experiments were performed by Jacobs et al (2018), we generated 29 additional profiles. Peak calling defined regions are shown in black on the top.
Examples of caQTLs linked to openness (left) and closeness (right) compared to the reference genome.
Adjusted P‐value by Fisher's exact test comparing the proportion of caQTLs versus random SNPs affecting each motif and aggregated delta score per topic and bulk regions.
cisTopic cell tSNE colored by the enrichment of regions whose accessibility is affected by caQTLs that alter the highlighted binding sites.
Examples of caQTLs in regions that belong to different topics and affect a certain binding site. Top: Motif with delta score. Middle: Representative bulk ATAC‐seq profile on lines with the reference and the alternative allele. Bottom: cisTopic cell tSNE colored by the accessibility of the region affected by the caQTL. The caQTL coordinates are, from left to right: chr3L:17392596, chr3R:14076593, chr2R:18674001 and chr2R:18674002, and chr3R:29376820.
cisTopic topic‐cell heatmap, based on a model with 21 topics. For running cisTopic, 50 single‐cell profiles were bootstrapped from the 15 bulk ATAC‐seq profiles of the GMR‐GAL4 UAS‐TF and wild‐type lines included in the screen.
Highlighted topics showing a representative topic region (top) and representative enriched motifs with their Normalized Enrichment Score (NES).
Heatmaps showing the normalized coverage of the early photoreceptor GGG enriched regions and late GGG enriched photoreceptor regions on the selected GMR‐GAL4 UAS‐TF lines.
Seurat cell tSNE colored by the expression of l(3)neo38, Nerfin‐1, and Prospero.
Venn diagram showing the overlap between the GGG enriched binding sites of Prospero, Nerfin‐1, and l(3)neo38. Differentially enriched motif in each class is shown with their adjusted P‐value.
Comment in
-
Closing the gap: a roadmap to single-cell regulatory genomics.Mol Syst Biol. 2020 May;16(5):e9497. doi: 10.15252/msb.20209497. Mol Syst Biol. 2020. PMID: 32430985 Free PMC article.
References
Publication types
MeSH terms
Substances
Associated data
- Actions
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
