Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug;620(7972):181-191.
doi: 10.1038/s41586-023-06252-9. Epub 2023 Jun 28.

A spatially resolved single-cell genomic atlas of the adult human breast

Affiliations

A spatially resolved single-cell genomic atlas of the adult human breast

Tapsi Kumar et al. Nature. 2023 Aug.

Abstract

The adult human breast is comprised of an intricate network of epithelial ducts and lobules that are embedded in connective and adipose tissue1-3. Although most previous studies have focused on the breast epithelial system4-6, many of the non-epithelial cell types remain understudied. Here we constructed the comprehensive Human Breast Cell Atlas (HBCA) at single-cell and spatial resolution. Our single-cell transcriptomics study profiled 714,331 cells from 126 women, and 117,346 nuclei from 20 women, identifying 12 major cell types and 58 biological cell states. These data reveal abundant perivascular, endothelial and immune cell populations, and highly diverse luminal epithelial cell states. Spatial mapping using four different technologies revealed an unexpectedly rich ecosystem of tissue-resident immune cells, as well as distinct molecular differences between ductal and lobular regions. Collectively, these data provide a reference of the adult normal breast tissue for studying mammary biology and diseases such as breast cancer.

PubMed Disclaimer

Figures

Extended Data Fig. 1 |
Extended Data Fig. 1 |. Frequency of Major Breast Cell Types Across Women and Sample Types.
a, Experimental workflow for breast tissue processing for scRNA-seq, showing different conditions used for digestion times and trypsin treatments. b, Pie chart showing ethnic backgrounds of women who provided tissue samples for the breast atlas. c, Cell type frequencies for different tissue sources (reduction mammoplasties - RM, prophylactic mastectomies - PM and contralateral mastectomies - CM), cells vs nuclei single cell RNA-seq protocols, and experimental dissociation protocol. d, Major cell type frequencies of matched left and right breasts from 22 women (left) and averages across all left and all right breast tissues (right). e, Stacked barplot showing the variation of cell type frequencies across the 126 women in scRNA-seq data. The top annotation bar shows the experimental workflow used (short/medium/long). f, Top regulons identified with SCENIC for each cell type cluster from the snRNA-seq data. g, Top regulons identified with SCENIC for each cell type cluster from the scRNA-seq data. h, Multi-dimensional scaling and Procrustes analysis to determine the concordance of left and right breast cell type frequencies and Pearson correlations for 22 women with matched breast tissue samples. P-value was calculated based on a two-sided test.
Extended Data Fig. 2 |
Extended Data Fig. 2 |. Ligand-receptor Interaction Analysis Between Cell Types.
Ligand-receptor interaction plots predicted from scRNA-seq data using CellPhoneDB between the major breast cell types. a, Interaction plot between the epithelial (Basal, LumHR and LumSec) cell types. b, Interaction plot between the epithelial and immune (B-cells, T-cells, Myeloid cells) cell types. c, Interaction plot between the epithelial and stromal (Fibroblasts, Perivascular and Vascular endothelial cells) cell types. d, Interaction plot within the stromal (Fibroblasts, Myeloid, Lymphatic, Vascular and Perivascular cells) cell types. e, Interaction plot between adipocytes and stromal cell types.
Extended Data Fig. 3 |
Extended Data Fig. 3 |. Spatial Transcriptomic Analysis of Breast Cell Types.
a, Integrated UMAP and unbiased clustering of ST data from 10 breast samples, showing 9 ST clusters. b, Histopathological images, and spatial distribution of ST clusters in the ST data from the breast tissues. c, Concordance of ST clusters and the scRNA-seq clusters of the major cell types using Fisher’s exact test. d, Pearson correlation analysis of marker gene expression levels between the ST clusters and the scRNA-seq data for different cell types. All p-values were calculated based on two-sided tests.
Extended Data Fig. 4 |
Extended Data Fig. 4 |. Spatial Analysis of Breast Cell Types with CODEX and smFISH.
a, Cell segmentation results of smFISH (Resolve) data across 12 tissue samples profiled from 5 different women. Cells were annotated based on combinations of markers for each cell type as described in Supplementary Table 6. b, Densities of cell types across three topographic areas using 12 tissues profiled by smFISH (Resolve). c, Heatmap of the top 5 targeted maker genes for each cell type in the smFISH (Resolve) data from 12 combined tissue samples. d, Cell segmentation results of CODEX data from 8 different women. Cells were annotated based on combinations or single protein markers to identify different cell types. e, Densities of cell types across three topographic areas from 8 different women by CODEX. f, Heatmap showing protein levels for markers that were used to identify different cell types in the CODEX data. (D: ducts, L: lobules and C: connective regions).
Extended Data Fig. 5 |
Extended Data Fig. 5 |. Analysis of Single Cell and Spatial Epithelial Data.
a, UMAPs of snRNA-seq data showing the expression of hormone receptor genes. b, Epithelial cell state frequencies across the 126 women in scRNA-seq data, where the top annotation bar represents the dissociation protocol. c, UMAP feature plots showing the expression of previously reported stem cell marker genes in the scRNA-seq epithelial dataset. d, Ligand-receptor interactions within the epithelial cell states predicted with CellPhoneDB. e, Cell cycle scoring of S-phase for different epithelial cell states detected in the scRNA-seq data. f, Cell cycle scoring for S-phase in the epithelial cell type clusters detected in the snRNA-seq data. g, smFISH (Resolve) data showing the expression of the MKI67 proliferation marker in the epithelial cells of the ducts and lobules from 4 different breast tissues. h, UMAP of different LumSec cell states and ELF5, LTF signature scores, respectively. i, Histopathological image of adjacent H&E section showing the anatomic annotation of ducts and lobules (left) and smFISH MERFISH (right panel) from P101 showing the spatial distribution of different LumSec cell states across different regions. j, Stacked barplot showing the distribution of different LumSec cell proportions in ducts and lobules across 3 MERFISH samples. k, Histopathological image (left panel) and smFISH MERFISH (right panel) from P101 showing the spatial distribution of the LumHR-SCGB population in a specific region of epithelium.
Extended Data Fig. 6 |
Extended Data Fig. 6 |. Spatial analysis of epithelial cells in ductal and lobular structures.
a, Spatial transcriptomic analysis showing clusters labelled as duct or lobule/TDLU from 3 breast tissues (P10, P35 and P47). b, smFISH (Resolve) data (P46-S1 and P46-S4) showing a subset of Keratin markers (left) and hormone receptor genes (right) and their localization to different breast tissue regions annotated as either duct or lobule/TDLU. c, CODEX data from P131 showing KRT5 in ducts and KRT19 in lobules/TDLU regions, with enlarged panels of the right. d, CODEX analysis from P130 of ductal and lobular/TDLU regions, showing differences for KRT14 levels in ducts and lobules. e, CODEX data from P131 showing protein levels of KRT8 and progesterone receptor (PR) in epithelial cells in the ducts and lobular/TDLU regions.
Extended Data Fig. 7 |
Extended Data Fig. 7 |. Immune cell subtypes in the breast and their variation in women.
a, H&E staining of plasma B-cells, T-cells, mast cell and macrophages (arrows) in human breast tissues. b, Stacked barplot showing the cell type frequencies of T, B and myeloid cells across 126 women in scRNA-seq data. Top annotation bar represents different tissue dissociation protocols that were utilized. c-e, Stacked barplots showing the cell state frequencies of T, B and myeloid cells across 126 women in scRNA-seq data respectively. f, Dot plot showing expression of checkpoint/exhaustion markers in NK and T cell states from the scRNA-seq data of 126 women. g, Ligand-receptor interaction analysis predicted with CellPhoneDB between the fibroblasts cell states and macrophage cell states.
Extended Data Fig. 8 |
Extended Data Fig. 8 |. Spatial analysis of immune cells in human breast tissues.
a, CODEX data from patient P130 and P131 showing localization of different immune cells with epithelial marker KRT19 and vascular marker CD31. Yellow and white arrows indicate CD4 Tregs and DCs, respectively. b, Frequency of T cells with the RUNX3 tissue residency marker in CODEX data. c, CODEX data (P130) showing immune cells in ductal, lobular and connective regions. d, Stacked barplots of CODEX data showing the density of immune cell types in each spatial region in 8 women. e, smFISH (Resolve) data (P46-S1) showing RNA localization of T, B and myeloid cells. f, Segmented smFISH (Resolve) data (P46-S1) showing cell localization of T, B and myeloid cells. g, smFISH (Resolve) data (P46-S1 and P47-S1) showing immune cell localization of B, T and myeloid cells across ducts, lobules and connective regions. h, Stacked barplots of smFISH (Resolve) data showing the density and proportion of immune cell types in different spatial regions. i, Adjacent histopathological tissue section (left) and segmented smFISH MERFISH data (right) from patient P91 showing the spatial distribution of m1, m2 macrophages and cDC2 populations in different regions of human breast tissue. j, Stacked barplot showing the density of m1, m2 macrophages and cDC2 populations in different regions across 3 smFISH MERFISH samples. k, Adjacent histopathological tissue section (left) and segmented smFISH MERFISH data (right) from patient P96 showing the spatial distribution of different B-cell states in different regions of human breast tissue. l, Stacked barplot showing the density of different B-cell states in different regions across three smFISH MERFISH samples.
Extended Data Fig. 9 |
Extended Data Fig. 9 |. Fibroblast cell states in the human breast.
a, Stacked barplot showing the fibroblasts cell state frequencies across 126 women in scRNA-seq data with top annotation bar representing the tissue dissociation protocol. b, Gene ontology enrichment analysis showing top enriched biological process gene sets associated with each cell state (Pos: positive; Neg: negative; Reg: regulation; RSTK: receptor protein serine/threonine kinase; TGF: transforming growth factor; IGF: insulin-like growth factor). c, CODEX data from P132 showing fibroblasts marked by VIM in the connective tissue (I) and interlobular (II) regions. d, smFISH (Resolve) data showing fibroblast markers in areas of connective tissue regions (I) and epithelial regions (II) from two women (P47-S1 and P69-S3). e, smFISH (Resolve) data (P35-S1) indicating spatial proximity regions with epithelial-proximal (Epi-prox), epithelial-middle (Epi-mid) and epithelial-distant (Epi-Dist) regions for 4 marker genes. f, Percentages of 4 markers that are proximal, middle or distant to the epithelial cells, quantified from the smFISH (Resolve) data. g, RNAscope in situ hybridization of breast tissues using an MMP3 probe in combination with anti-Vimentin and anti-PanCK immunofluorescent staining, with enlarged panel (right). h, Ligand-receptor interactions between fibroblasts, adipocytes and myeloid cell states predicted using CellPhoneDB.
Extended Data Fig. 10 |
Extended Data Fig. 10 |. Endothelial cell diversity in the Human Breast.
a, Stacked barplot showing the endothelial cell state frequencies across the 126 women in scRNA-seq data, with top annotation bar showing the tissue dissociation protocol. b, Dot plot of gene ontology enrichment results for 4 lymphatic cell states. c, Heatmap showing top gene expression for vascular and lymphatic endothelial clusters detected in the ST data. d, smFISH (Resolve) data showing veins (ACKR1) and capillaries (RBP7), as well as a canonical vascular marker (VWF) in two different HBCA samples (P46-S3 and P69-S3). e, Adjacent H&E tissue section with pathological annotations (left panel) and segmented smFISH MERFISH data (right panel) from P101 showing the spatial distribution of vascular endothelial cell states in different regions of human breast tissue. f, Stacked barplot showing the density of vascular endothelial states in different regions across 3 smFISH MERFISH samples.
Extended Data Fig. 11 |
Extended Data Fig. 11 |. Perivascular cells in Human Breast Tissues.
a, Stacked barplot showing the perivascular cell state frequencies across the 126 women in scRNA-seq data with top annotation bars indicating the tissue dissociation protocol. b, UMAPs of pericytes and vascular smooth muscle cells (VSMCs) and feature plots of the VSMCs marker genes (SYNM and ACTG2). c-e, smFISH (Resolve) data showing expression of pericyte marker RGS5, together with vascular marker VWF and fibroblast marker COL1A1 in lobular and ductal regions from 2 different breast tissue samples (P47-S1 and P46-S3). f, CODEX results from P131 showing vascular cells (anti-CD31) and pericytes (anti-LIF) in a TDLU region. g, smFISH MERFISH from P96 showing the spatial distribution of vascular endothelial cell states (left panel) and perivascular cell states (right panel) in different regions of human breast tissue. h, smFISH (MERFISH) data showing arteries (SOX17) and VSMCs (ATCG2 and SYNM) in breast tissue.
Extended Data Fig. 12 |
Extended Data Fig. 12 |. Metadata correlations with breast cell types and states.
a, Boxplots showing the major cell type frequencies across ethnicity status in the n = 69 women using Wilcoxon rank sum test (top). Significant associations of cell states with ethnicity status using Fisher’s exact test (bottom). b, Boxplots showing the major cell type frequencies across pre- and post-menopause status in the n = 71 women using Wilcoxon rank sum test (top). Significant associations of cell states with menopause status using Fisher’s exact test (bottom). c, Boxplots showing the major cell type frequencies across different age groups using Wilcoxon rank sum test, young (<50 years) and old (>50 years) for n = 76 women (top). Significant associations of cell states with age groups using Fisher’s exact test (bottom). d, Boxplots showing the major cell type frequencies across different breast density (high, low) groups in the n = 16 women using Wilcoxon rank sum test (top). Significant associations of cell states with breast density using Fisher’s exact test (bottom). e, Boxplots showing the major cell type frequencies across different BMI status in 73 women using Wilcoxon rank sum test, overweight (BMI >= 25 and < 30) and obese (BMI >= 30). f, Boxplots showing the major cell type frequencies across different parity status (nulliparous, parous) status in the n=64 women using Wilcoxon rank sum test. All p-values were calculated based on two-sided tests. Boxplots show the median with interquartile ranges (25–75%), while whiskers extend to 1.5× the interquartile range from the box.
Extended Data Fig. 13 |
Extended Data Fig. 13 |. Summary of the Major Cell Types and States in Breast Tissues.
This illustration summarizes all of the breast cell types and cell states that were identified in the HBCA study. a, Summary of cell lineages from cell types to cell states. b, Mapping of cell types and cell states to the four major spatial regions (Adipose, Connective, Ductal, Lobular) that were supported by the spatial technologies. Not all cell states were assigned to specific spatial regions, in cases where the data did not support their assignment. Individual figures were created with BioRender.com.
Fig. 1 |
Fig. 1 |. Major cell types of the adult human breast.
a, Anatomy of the adult human breast and a pathological haematoxylin and eosin (H&E) section, with illustrations of the major breast cell types. b, The workflow of the HBCA project. c, Uniform manifold approximation and projection (UMAP) projection of scRNA-seq data from 714,331 cells integrated across 167 tissues from 126 women, showing 10 clusters that correspond to the major cell types. d, Consensus heat map of the top 7 genes expressed in each cell type cluster from averaged scRNA-seq data. e, UMAP representation of snRNA-seq data from 117,346 nuclei integrated across 24 tissues from 20 women, showing 11 cell type clusters. f, Consensus heat map of the top 7 genes expressed in each cell cluster from averaged snRNA-seq data. Adipo., adipocytes; perivasc., perivascular cells.
Fig. 2 |
Fig. 2 |. Spatial analysis of major breast cell types.
a, ST experiment from patient P35 showing the H&E image with histopathological regions annotated (left) and clustering results (right). A, adipose tissue; C, connective tissue; D, ductal tissue; L, lobule. b, Consensus heat map of the top four marker genes in each ST cluster from ten integrated tissue samples. Exp., expression. c, The frequencies of the ST clusters from ten tissue samples across the four topographic tissue regions. d, smFISH experiments (Resolve) using a custom 100-gene panel, showing a subset of 10 genes that mark different cell types in sample 1 of P46 (P46-S1) (left) and cell segmentation using combinations of markers to identify cell types, with topographic areas annotated (right). e, Spatial colocalization graph of the cell types in smFISH (Resolve) data from 12 tissue samples. The node size represents the cell number and the edge width represents the probability of colocalization. f, Cell type frequencies across 3 topographic regions from 12 smFISH (Resolve) tissue samples. g, CODEX data from P130 showing ductal–lobular structure with five protein markers (left) and cell segmentation using combinations of markers to identify cell types, with topographic areas annotated (right). h, Spatial colocalization graph of the cell types in the CODEX data from eight tissue samples. The node size represents the cell number and the edge width represents the probability of colocalization. i, Cell type frequencies across three topographic regions from eight CODEX tissue samples. Scale bars, 1 mm (a) and 500 μm (d and g).
Fig. 3 |
Fig. 3 |. Epithelial cells of the human breast.
a, H&E section of breast tissue showing the epithelial bilayer of two ducts. b, UMAP representation of scRNA-seq data from 240,804 epithelial cells, showing three major epithelial types. c, UMAP representation of snRNA-seq data from 55,557 epithelial nuclei, showing three major epithelial types and two proliferating clusters. d, The keratin genes expressed across the three major epithelial cell types. e, UMAP representation of 102,228 basal epithelial cells. f, UMAP representation of 75,247 LumHR epithelial cells showing 3 cell states. g, UMAP representation of 63,329 LumSec epithelial cells showing 7 cell states. h, Expression of secretoglobin genes across the epithelial cell states. i, Expression of HLA class I and HLA class II genes for the epithelial cell states. j, The top genes expressed for each epithelial cell state averaged across the scRNA-seq data. k, Lactation gene signature scores for the epithelial cell states. l, G2/M cell cycle scores across different epithelial cell states. m, The fraction of proliferating epithelial cells in the scRNA-seq and snRNA-seq data. n, CODEX data from patient P130 showing proliferating cells in ducts and lobules labelled with PCNA. o, The top ST differentially expressed genes between ducts versus lobules from ten integrated tissue samples. Avg., average. p, smFISH (Resolve) data from patient P46 showing genes that are expressed specifically in ductal and lobular regions. q, smFISH (Resolve) data in the ducts and lobules. Scale bars, 100 μm (a), 200 μm (n) and 500 μm (p).
Fig. 4 |
Fig. 4 |. Immune cell ecosystem in human breast tissues.
a, Immune and non-immune cell type frequencies in the scRNA-seq and snRNA-seq data. b, Immune and non-immune cell type frequencies by tissue source in the scRNA-seq data. CM, contralateral mastectomies; PM, prophylactic mastectomies; RM, reduction mammoplasties. c, Immune cell type frequencies in the CODEX (n = 8) and smFISH (Resolve) (n = 12) data. d, CODEX data from patient P130 showing a TDLU region with localization of six immune cell types/states. Segmented cells are shown as coloured dots over immunofluorescence staining of SMA (myoepithelial) and CD31 (vessel) for spatial reference. e, smFISH (Resolve) data (P46-S1) showing a TDLU region with localization of three immune cell types. Segmented cells are shown as coloured dots over immunofluorescence staining of KRT5 (basal epithelial) and VWF (endothelial) for spatial reference. f, UMAP representation of 76,567 NK and T cells from scRNA-seq data showing 14 cell states. g, The top genes expressed for each NK and T cell cluster using average values across single cells. h, UMAP representation of 12,510 B cells from scRNA-seq data showing five cell states. Bmem, memory B. i, The top genes expressed for each B cell state using average values across single cells. j, UMAP representation of 30,789 myeloid cells from scRNA-seq data showing of 15 cell types and states. k, The top genes expressed for each myeloid cell cluster using averaged scRNA-seq values. l, CODEX data from patient P130 showing localization of immune cells and a vascular marker (CD31). m, smFISH (Resolve) segmented data (P46-S1) showing localization of immune cells and a vascular marker (VWF). n, The frequency of immune cells that are in the proximity of vascular endothelial cells versus other cell types as determined by neighbourhood analysis of the CODEX and smFISH (Resolve) data.
Fig. 5 |
Fig. 5 |. Breast fibroblasts and adipocytes.
a, Histopathological sections showing regions with intralobular and interlobular fibroblasts (arrowheads) in the breast. b, UMAP representation of 208,390 fibroblast cells, showing 4 cell states. c, The top genes expressed for each fibroblast cell state, averaged from the scRNA-seq data. d, The collagen gene signature scores (left) and expression of FAP (right) across different fibroblast cell states in the scRNA-seq data. e, smFISH data (Resolve) from patient P69 (P69-S3) showing a subset of four fibroblast genes and their distribution in the connective tissue (i) and intralobular (ii) areas. f, Histopathological section of breast adipose tissue. g, UMAP representation of 6,637 adipocytes from snRNA-seq data. h, ST data showing an adipocyte cluster in P46. i, Expression of top adipocyte genes, white adipocyte markers and beige adipocyte markers in the ST data and snRNA data. For a, e and f, scale bars, 100 μm.
Fig. 6 |
Fig. 6 |. Vascular, perivascular and lymphatic cells in the human breast.
a, Histopathological section showing an artery, vein and capillary structure in normal breast tissue. b, UMAP representation of 83,651 vascular endothelial cells showing 3 major cell states. c, Canonical and top genes expressed for each vascular endothelial cell state, using averaged values from the scRNA-seq data. d, Histopathological section showing a lymphatic duct in the breast tissue. e, UMAP representation of 8,982 lymphatic endothelial cells, showing 4 major cell states. f, Expression of canonical and top genes for each lymphatic cell state, averaged from the scRNA-seq data. g, smFISH (Resolve) data from patient P47 (P47-S1) showing a subset of vascular gene markers (VWF, ACKR1, RBP7 and GJA4) and lymphatic markers (PROX1), with two enlarged regions (R1 and R2). h, CODEX data from patient P130 showing a TDLU region with vascular cells (anti-CD31) and lymphatic cells (anti-PDPN) cells, and basal cells labelled (anti-SMA) with two enlarged regions. i, Histopathological sections showing a pericyte and capillary structure, as well as an artery and VSMCs in normal breast tissue. j, UMAP projection and clustering of 52,638 perivascular cells, showing 2 cell states. k, Canonical markers and the top genes expressed for each perivascular cell state from averaged scRNA-seq data. Scale bars, 50 μm (a, d and i) and 500 μm (g and h).

Update of

References

    1. Hassiotou F. & Geddes D. Anatomy of the human mammary gland: current status of knowledge. Clin. Anat 26, 29–48 (2013). - PubMed
    1. Russo J, Rivera R. & Russo IH Influence of age and parity on the development of the human breast. Breast Cancer Res. Treat 23, 211–218 (1992). - PubMed
    1. Gusterson BA & Stein T. Human breast development. Semin. Cell Dev. Biol 23, 567–573 (2012). - PubMed
    1. Nguyen QH et al. Profiling human breast epithelial cells using single cell RNA sequencing identifies cell diversity. Nat. Commun 9, 2028 (2018). - PMC - PubMed
    1. Bach K. et al. Differentiation dynamics of mammary epithelial cells revealed by single-cell RNA sequencing. Nat. Commun 8, 2128 (2017). - PMC - PubMed

Publication types