Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul;57(7):1742-1753.
doi: 10.1038/s41588-025-02225-y. Epub 2025 Jun 25.

Single-cell and spatial transcriptomics of stricturing Crohn's disease highlights a fibrosis-associated network

Affiliations

Single-cell and spatial transcriptomics of stricturing Crohn's disease highlights a fibrosis-associated network

Lingjia Kong et al. Nat Genet. 2025 Jul.

Abstract

Fibrosis is a major complication of Crohn's disease (CD) marked by excess deposition of extracellular matrix, leading to stricturing and functional impairment. As mechanistic characterization and therapeutic options are lacking, we paired single-cell and spatial transcriptomics in 61 samples from 21 patients with CD and 10 patients without inflammatory bowel disease (IBD). Intestinal strictures were characterized by increased immune cells, including IgG+ plasma cells, CCR7-hi CD4+ T cells and inflammatory fibroblasts. Spatial transcriptomics showed that key subsets colocalize within diseased tissues and identified additional populations such as interstitial cells of Cajal and enteric neurons. Furthermore, we mapped gene expression onto intestinal biogeography, finding that known genetic risk loci are enriched within discrete spatial modules, defined by the presence of inflammatory fibroblasts and lymphoid follicles. Altogether, our datasets chart the key transcriptomic and cellular networks in stricturing CD and highlight the spatial organization of multicellular genetic risk factors.

PubMed Disclaimer

Conflict of interest statement

Competing interests: R.J.X. is cofounder of Jnana Therapeutics, Scientific Advisory Board member at Nestlé, Magnet BioMedicine and Arena BioWorks, and Board Director at MoonLake Immunotherapeutics; these organizations had no roles in this study. S.S. is currently an employee of Vertex Pharmaceuticals, which also had no role in this study. The other authors declare no competing interests.

Figures

Extended Data Fig. 1
Extended Data Fig. 1. Annotation and subclustering of scRNA-seq data.
(a-d) UMAP and subcluster annotations of epithelial cells (a), myeloid cells (b), T cells (c, including a subcluster of plasmacytoid dendritic cells that initially clustered with T cells) and stromal cells (d). The epithelial UMAP uses the original embeddings while the others use subcluster specific embeddings. (e-h) Dot plots showing relevant markers gene (columns) expression in the indicated subclusters (rows) of epithelial cells (e), myeloid cells (f), T cells (g) and stromal cells (h). Dot color indicates normalized expression, dot size the fraction of cells expressing the gene within the subcluster. (i-l) UMAP visualization of all cells from scRNA-seq data, colored by tissue state (i, indicating stricture, non-stricture, inflamed, or control), fraction (j, epithelial fraction or underlying tissue), location (k, small intestine, ascending colon, or colon) and collection procedure (l, biopsy or resection).
Extended Data Fig. 2
Extended Data Fig. 2. Cell types and samples in scRNA-seq data.
(a) Heatmaps show the cell type correlation (Pearson correlation) between this paper and our previous CD paper. Left: terminal ileum; right: colon. (b) Separate PCoAs of Bray-Curtis dissimilarities per fraction. Left: samples from immune fraction; right: samples from epithelial fraction. (c) Barplots show significant differences in composition for resection (blue) relative to biopsy (yellow) samples in the epithelial compartment. *padj. < 0.05, **padj. < 0.01 (Wald test from Dirichlet regression with FDR correction; Methods). Blue asterisks indicate overrepresentation in resection vs. biopsy samples, while red asterisks indicate underrepresentation. Error bars are SEM. The total number of cells contributing to each bar is also shown. (d,e) Similar to (c), but from stromal and immune compartments. (f) Barplots contrast the number of cells from resection and biopsy samples.
Extended Data Fig. 3
Extended Data Fig. 3. Cell type distribution by stricture status in scRNA-seq data.
(a) Barplots show significant differences in cell type frequency for non-IBD (green) and non-stricture (blue) samples relative to stricture (red) samples in immune, stromal epithelial, and epithelial fraction immune compartments. *padj. < 0.05, **padj. < 0.01 (Wald test from Dirichlet regression with FDR correction; Methods). Blue asterisks indicate overrepresentation in non-stricture or non-IBD vs. stricture samples, while red asterisks indicate underrepresentation. Error bars are SEM. The total number of cells contributing to each bar is also shown. (b) Similar to (a), only using paired stricture and non-stricture samples (Methods).
Extended Data Fig. 4
Extended Data Fig. 4. Differential gene expression across non-stricture, stricture, and high IAF samples.
(a) Ranked ratio between the total number of DEGs (as shown in Fig. 2c) and number of analyzed cells per cell type. (b) Volcano plots for inflammatory fibroblasts, with up to 15 DEGs (FDR < 0.05) per direction labeled. (c) Number of DEGs between paired high IAF and non-stricture samples, displayed by cell type. Compare with number of DEGs between all stricture and non-stricture samples in Fig. 2c. (d) Fractions of cell types in which KEGG pathways are significantly enriched (FDR < 0.05) within each compartment and by each comparison, split by enrichment direction.
Extended Data Fig. 5
Extended Data Fig. 5. Spatial distribution of cell types across all tissue sections.
For non-stricturing (top) and stricturing (bottom) tissue sections (columns), the spatial distributions of: spot clusters, radial axis scores, and the cell lineage proportions for B cells, epithelial cells, fibroblasts, myeloid cells, pericytes, and T cells across all profiled spots. Cell type proportions in each spot were estimated from the deconvolution with Bayes Prism. Muscle cells were absent from the single-cell atlas and not included in the deconvolution, which substituted them with the most transcriptionally similar cell types, pericytes and myofibroblasts. Radial axis scores were congruent with the proportions of epithelial cells and pericytes (muscle). Scale bars: 2 x 2.5mm (all images).
Extended Data Fig. 6
Extended Data Fig. 6. Spatial clustering and gene expression of inflammatory and collagen-hi fibroblasts.
(a) For select cell types (x axis), the distribution of the radial axis scores of spots that contain high proportions of the cell type (> 10% frequency estimated by the BayesPrism deconvolution). For clarity, only cell types with statistically significant differences from other cells are shown. Wilcoxon test, p-values: epithelium (n = 21,408, p = 0), IgG+ plasma (n = 3,027, p = 1x10−95), CD63+CD81+ macrophage (n = 3,226, p = 2x10−64), tissue fibroblast (n = 15,640, p = 3x10−113), inflammatory fibroblast (n = 4,341, p = 1x10−82), venous endothelium (n = 4,252, p = 1x10−9), arterial endothelium (n = 1,384, p = 8x10−17), muscle (n = 46,040), p < 1x10−300), tissue macrophages (n = 1,671, p = 2x10−92), and collagen-hi fibroblasts (n = 1,798, p = 3x10−113). Boxplot quantiles: 25%, 50%, 75%; whiskers: 1.5 IQR. Adjusted p-values: ** < 1x10−3, *** < 1x10−10. (b,c) DEGs for fibroblast subsets. For DEGs in inflammatory (n = 1,368) (b) and collagen-hi (n = 111) (c) fibroblasts relative to all other fibroblasts (n = 9,084 total), volcano plots showing the significance (y axis) and fold change in mean gene expression (x axis), computed from the single-cell reference atlas. The labels of select genes with statistically significant differential expression are provided (statistical significance assessed using a regularized logistic regression model; Methods). (d) For select cell types (x axis), the distribution of the number of UMIs per single cell (y axis). For clarity, the 19 cell types with the highest number of UMIs per cell are shown, with all other cells combined into the “Other” group. Boxplot quantiles: 25%, 50%, 75%; whiskers: 1.5 IQR. Total number of cells (left to right): 43; 8; 95; 14; 239; 24,147; 21,380; 551; 285; 6,465; 3; 111; 854; 2,993; 1,882; 1,140; 74; 662; 1,368; and 284,703. (e) Interactions between inflammatory fibroblasts and T cells (top) and epithelial cells (bottom). Zoomed H&E images for samples V10A14-143_D (top row) and V11Y24-011_C (bottom row), overlaid with the gene expression levels of the induced target gene (IL11 or IL24, left column), the proportions of the “expressing” cell across spots (inflammatory fibroblast, middle column), and the proportions of the “context” cell across spots (T cells or epithelial cells, right column). Scale bar: 0.5mm (all images).
Figure 1.
Figure 1.. Overview of single-cell RNA-sequencing (scRNA-seq) measurements.
(a) scRNA-seq data collection: 61 samples were collected from 21 Crohn’s disease (CD) patients, including 24 stricture samples (red, S), 22 non-stricture samples (blue, NS), and 2 inflamed samples (yellow, I), and 11 control samples from 10 non-inflammatory bowel disease (IBD) patients (green, C). (b) Uniform manifold approximation and projection (UMAP) visualization of all cells from scRNA-seq data, colored by clusters. Subcluster annotation of epithelial, myeloid, T, and stromal cells is presented in Extended Data Fig. 1. (c) Principal coordinates analysis (PCoA) of Bray-Curtis dissimilarities from sample-level cell type composition profiles. (d) PCoA of Bray-Curtis dissimilarities from patient level cell type composition profiles (Methods). (e) Top 25 most abundant cell types in patient level PCoA (from wascores in the R package vegan v2.6-6.1). (f) Barplots show significant differences in composition for resection (blue) relative to biopsy (yellow) samples in immune, stromal, and epithelial compartments. *padj. < 0.05, **padj. < 0.01 (Wald test from Dirichlet regression with false discovery rate (FDR) correction; Methods). Blue asterisks indicate overrepresentation in resection vs. biopsy, while red asterisks indicate underrepresentation. Error bars are standard error of the mean (SEM). The total number of cells contributing to each bar is also shown.
Figure 2.
Figure 2.. Compositional differences and cell type-specific differential expression in scRNA-seq data.
(a) Barplots show significant differences in cell type frequency for non-IBD control (green) and non-stricture (blue) samples relative to stricture (red) samples in immune, stromal and epithelial compartments. *padj. < 0.05, **padj. < 0.01 (Wald test from Dirichlet regression with FDR correction; Methods). Blue asterisks indicate overrepresentation in stricture or non-stricture samples vs. non-IBD controls, while red asterisks indicate underrepresentation. Error bars are SEM. The total number of cells contributing to each bar is also shown. (b) Similar to (a), only using paired stricture and non-stricture samples (Methods). (c) Number of differentially expressed genes (DEGs) between stricture and non-IBD control samples (left) and between stricture and non-stricture samples (right), displayed by cell type (discrete component of the MAST model; FDR < 0.05; Methods, Supplementary Table 2). (d) Relationship between stricture vs. non-IBD and stricture vs. non-stricture samples within each cell compartment. The 1:1 line is represented in orange. DE coefficient is the discrete coefficient from the MAST model (p-value from Spearman ρ statistic. Methods). (e) Fractions of cell types in which KEGG pathways are significantly enriched (FDR < 0.05; Methods) within each compartment and by each comparison, split by enrichment direction. The number of significant pathways per cell type for each comparison is also shown. All pathway enrichment results are provided in Supplementary Table 3.
Figure 3.
Figure 3.. Inflammation-associated fibroblast (IAF) scores and stricture-related expression in scRNA-seq data.
(a) IAF scores per patient for stricture (red), non-stricture (blue) and non-IBD control (green) channels, built from inflammatory fibroblast expression (Methods). (b) Three cell types that show significant correlation (padj. < 0.1, Pearson correlation) between IAF score and cell type abundance with linear fit (blue line) and 95% confidence level interval (grey shading) shown. (c) Number of genes that are significantly correlated (padj. < 0.05, Pearson correlation) with IAF score per cell type (Methods). (d) Heatmap of log fold change (FC) values of genes that are differentially expressed between high IAF and non-stricture samples and that are also significantly correlated (padj. < 0.05) with IAF score in IgG+ plasma cells. Asterisks indicate FDR < 0.05. ‘IAF’ row shows differential expression comparison between high IAF and non-stricture samples; ‘All’ row shows comparison between stricture and non-stricture samples. (e) Pearson correlation between gene expression and IAF score for the same genes in (d). Linear fit (blue line) and 95% confidence level interval (grey shading) are shown. (f) Volcano plot of genes that are significantly correlated (padj. < 0.05, Pearson correlation using t-test with FDR correction) with IAF score in arterial endothelium (34 colored dots), with 15 selected genes of interest labeled. Grey dots are non-significant genes.
Figure 4.
Figure 4.. Spatial atlas of the human small intestine to study stricture formation in CD.
(a) Representative spatial mapping of adjacent tissue resections from regions with and without stricturing. Left: Hematoxylin and eosin (H&E) staining for adjacent non-stricturing (top) and stricturing (bottom) samples. Right: Inferred tissue regions and cell type proportions for all spots profiled via spatial transcriptomics, showing defined epithelial, lamina propria (fibroblast and immune), muscle, and neuron regions. Muscle and neurons, which were not included in the deconvolution, are represented by pericytes and glia. Scale bar: 1mm (all images). (b) Unsupervised clustering of spots reveals intestinal cell types and spatial structures. Top: H&E staining for adjacent non-stricturing (green) and stricturing (orange) samples. Bottom: Spatial organization of 58,815 spots profiled from these samples, belonging to 15 spatial clusters (color legend). Scale bar: 1mm (all images). (c,d) Cell type composition and marker gene expression in spatial clusters. (c) Heatmap of the proportion (color) of each cell type (x axis) in each spatial cluster (y axis), estimated from the deconvolution. For clarity, only cell types comprising >1% of a spatial cluster are shown. (d) Heatmap of the mean expression (color) of curated marker genes (x axis) in each spatial cluster (y axis), annotated by known cell types (x axis). Only clusters not found in the scRNA-seq data are shown. (e) Changes in the abundances of spatial clusters between tissue resections with and without stricturing. Barplot shows the mean proportions (x axis) of each spatial cluster (y axis), showing the amount present in both conditions (grey) as well as non-stricturing (green) and stricturing (orange) conditions.
Figure 5.
Figure 5.. Spatial atlas enables the discovery of cell type positions and clustering in the human gut.
(a) Spatial distributions of cell types. For each cell type inferred from the scRNA-seq data (colored dots), we inferred its mean position along the radial axis (x axis) and degree of spatial clustering (y axis; Moran’s I). Labels and colors denote cell types positioned 1.5 standard deviations (σ) from the mean; all other cell types are colored grey. (b) Cell types enriched along the radial axis. For representative cell types enriched near epithelium or muscle, spatial organization of cell types (x and y axes) showing their proportion within each profiled spot (top), along with the radial axis projection of all spots (bottom), with epithelium and muscle regions labeled. Scale bar: 1mm. (c) Radial axis distributions for cell types. For epithelium- (top) and muscle-associated (bottom) cell types (colored as in a), density of cell type proportions (y axis) over radial axis projection values (x axis) is shown. (d-f) Enteric neuron cluster subtypes have distinct spatial distributions. (d) t-distributed stochastic neighbor embedding (t-SNE) projection (x, y axes) of all spots with high neuron signature score clustered into 8 groups (see shared color legend for panels d-f), with putative neurons labeled. (e) Representative tissue sample showing the spatial organization (x, y axes) of spots corresponding to all putative neuron subsets (colors) across distinct tissue regions (greys), with epithelium and muscle labeled. Scale bar: 1mm. (f) Enteric neuron subsets have distinct spatial distributions. Distribution of radial axis projections (y axis) for spots corresponding to enteric neuron subsets (x axis), showing significant differences for each neuron subset compared to all other subsets. Wilcoxon test p-values: 8x10−26, 9x10−10, 4x10−4, and 4x10−4, respectively. Boxplot quantiles: 25%, 50%, 75%; error bars: standard deviation (SD). (g) Enteric neuron subsets express distinct genes. Heatmap shows the mean expression levels (color) of marker genes (x axis) across clusters of spots with high neuron signature scores (y axis), clustered (left).
Figure 6.
Figure 6.. Mapping multicellular circuits in the human gut that underlie stricture formation.
(a) Interaction network for cell types (nodes) belonging to different lineages (node colors). Edges connect cells with correlated frequencies across spots (q<0.05; SPARCC), for cell pairs enriched in non-stricturing or stricturing samples (edge colors). Select annotations provided for parts of this network (labels 1-6) and their respective tissue structures. (b) Representative images of interacting cells. For parts of the network (labels 1-6) significantly enriched in non-stricturing (top) or stricturing (bottom) samples, multichannel plots show the spatial organization (x, y axes) and cell type proportions across spots (color legend). (c) For select multicellular interactions (x axis; labels 1-6), the mutual information of cell proportions (y axis) across all 20 samples (black) and permuted nulls (grey). Interacting cells co-localized in samples (all q<0.05; *=0.05; ***=0.001). Boxplot quantiles: 25%, 50%, 75%; error bars: SD. (d) Interaction network for cell types (nodes) belonging to different lineages (node colors). Edges connect cells enriched for the expression of cognate ligands and receptors (top and bottom edge labels), colored by the lineage of the expressing cell. (e) Representative images of ligand–receptor expression. Top: for cell types belonging to distinct lineages (label color), the spatial organization (x, y axes) and co-localization frequencies (color legend) of spots, measured as the product of cell proportions. Bottom: for ligand–receptor pairs (labels, colored by lineage of expressing cell), the spatial organization (x, y axes) and co-expression patterns (color legend) of spots, measured as the product of their expression levels. Spatial patterns of top and bottom panels are congruent. (f) Heatmap showing interaction strength (i.e., expression fold-change) (color legend) for inflammatory fibroblast genes (y axis) across different cellular contexts (x axis), highlighting epithelial repair and immune activation programs. (g) Representative images of inflammatory fibroblast gene expression across different cellular contexts. For putative interactions between inflammatory fibroblasts and epithelial (top) and T (bottom) cells, the spatial organization (x, y axes) of spots colored by proportions of expressing inflammatory fibroblasts, proportions of context cells, their co-localization frequencies, and mean expression of induced genes. Scale bar (all images in b, e, g): 1mm.
Figure 7.
Figure 7.. CD risk loci map to gene expression patterns that are enriched in tissue regions.
(a) For genes associated with CD risk variants (columns), mean expression levels (colors) across spatial clusters (rows). Dendrogram: hierarchical clustering of genes into spatially distinct profiles. The dotted line designates branches not unique to any of the clusters. Genes are annotated by the cell lineage with the highest expression of the gene in the single-cell dataset. (b) Enrichment of gene clusters in tissue regions. Heatmap of statistically significant enrichments (colors) of the gene clusters (columns; labeled according to a) within spatially distinct spatial clusters (rows). (c) Top: for select CD risk gene clusters, mean expression level (y axis; scaled log2(TP10K+1)) in spots belonging to epithelial (n = 11,725), inflammatory (n = 1,707), and follicle (n = 2,120) clusters vs. all other spots (n = 58,815 total). Bottom: mean residual expression level (y axis) after accounting for the underlying composition of cell types using the spatial deconvolution model. Wilcoxon test p-values shown. N.S.: not significant. Boxplot quantiles: 25%, 50%, 75%; whiskers: 1.5 interquartile range (IQR). (d) Representative images of risk gene expression. For CD risk gene clusters and associated tissue regions, the spatial organization (x, y axes) of the mean expression of the gene cluster (bottom) and the annotated tissue regions (top) across all spots. Gene clusters 4, 5, and 6 are significantly enriched in epithelial, inflammatory fibroblast, and follicle areas. Scale bar: 1mm (all images). (e,f) CD risk genes are expressed in defined spatial contexts. (e) Interaction network showing upregulated CD risk genes (edges) within expressing cell types (blue nodes) next to context lineages (green nodes). (f) Representative cell–cell interactions. For genes induced in ANPEP-hi enterocytes adjacent to T cells, the spatial organization (x, y axes) of their cell proportions (top) and gene expression (bottom). Top: proportions of ANPEP-hi enterocytes (left), T cells (middle), and the interaction between them (right). Bottom: observed (left), predicted (middle), and residual (right) mean gene expression of induced genes, showing congruence between epithelial–T cell interaction zones and areas with high error in the model. Scale bar: 1mm (all images).

References

    1. Lewis JD et al. Incidence, Prevalence, and Racial and Ethnic Distribution of Inflammatory Bowel Disease in the United States. Gastroenterology 165, 1197–1205 e2 (2023). - PMC - PubMed
    1. Graham DB & Xavier RJ Pathway paradigms revealed from the genetics of inflammatory bowel disease. Nature 578, 527–539 (2020). - PMC - PubMed
    1. Kugathasan S et al. Prediction of complicated disease course for children newly diagnosed with Crohn’s disease: a multicentre inception cohort study. Lancet 389, 1710–1718 (2017). - PMC - PubMed
    1. D’Alessio S et al. Revisiting fibrosis in inflammatory bowel disease: the gut thickens. Nat. Rev. Gastroenterol. Hepatol 19, 169–184 (2022). - PubMed
    1. Smillie CS et al. Intra- and Inter-cellular Rewiring of the Human Colon during Ulcerative Colitis. Cell 178, 714–730.e22 (2019). - PMC - PubMed

METHODS-ONLY REFERENCES

    1. Li B et al. Cumulus provides cloud-based data analysis for large-scale single-cell and single-nucleus RNA-seq. Nat. Methods 17, 793–798 (2020). - PMC - PubMed
    1. Fleming SJ et al. Unsupervised removal of systematic background noise from droplet-based single-cell experiments using CellBender. Nat. Methods 20, 1323–1335 (2023). - PubMed
    1. Wolock SL, Lopez R & Klein AM Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. Cell Syst 8, 281–291.e9 (2019). - PMC - PubMed
    1. Wolf FA, Angerer P & Theis FJ SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018). - PMC - PubMed
    1. Cham LB et al. Single cell analysis reveals a subset of cytotoxic-like plasmacytoid dendritic cells in people with HIV-1. iScience 26, 107628 (2023). - PMC - PubMed