Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 6;385(6713):eadk9217.
doi: 10.1126/science.adk9217. Epub 2024 Sep 6.

Single-cell chromatin accessibility reveals malignant regulatory programs in primary human cancers

Affiliations

Single-cell chromatin accessibility reveals malignant regulatory programs in primary human cancers

Laksshman Sundaram et al. Science. .

Abstract

To identify cancer-associated gene regulatory changes, we generated single-cell chromatin accessibility landscapes across eight tumor types as part of The Cancer Genome Atlas. Tumor chromatin accessibility is strongly influenced by copy number alterations that can be used to identify subclones, yet underlying cis-regulatory landscapes retain cancer type-specific features. Using organ-matched healthy tissues, we identified the "nearest healthy" cell types in diverse cancers, demonstrating that the chromatin signature of basal-like-subtype breast cancer is most similar to secretory-type luminal epithelial cells. Neural network models trained to learn regulatory programs in cancer revealed enrichment of model-prioritized somatic noncoding mutations near cancer-associated genes, suggesting that dispersed, nonrecurrent, noncoding mutations in cancer are functional. Overall, these data and interpretable gene regulatory models for cancer and healthy tissue provide a framework for understanding cancer-specific gene regulation.

PubMed Disclaimer

Conflict of interest statement

Competing Interests

WJG and HYC are named as an inventor on patents describing ATAC-seq methods. 10x Genomics has licensed intellectual property on which WJG and HYC are listed as inventors. WJG holds options in 10x Genomics and is a consultant for Ultima Genomics and Guardant Health. WJG is a scientific co-founder of Protillion Biosciences. AK and KF are employees of Illumina. LS, NGR and AS were employees of Illumina. LS is currently an employee of NVIDIA. The other authors declare no competing interests. AC has Research funding from Bayer and is consultant for BirdsEye Bio. PWL is a SAB broad member for Tagomics, LLC, FOXO Technologies, and AnchorDX.

Figures

Figure 1:
Figure 1:. A pan-cancer, single-cell chromatin accessibility atlas identifies cell type-specific features of tumor and immune cells.
A) Diagram of the 8 cancer types profiled in this study. Numbers of samples from each cancer type are shown in parentheses. Colors are kept consistent throughout the manuscript. B) UMAP of all filtered cells based on accessible chromatin regions (scATAC-seq). Cells are colored according to cancer type. C) UMAP of cells from the LUAD-55 sample (TCGA-86-A4D0) based on scATAC-seq accessible chromatin regions. D-F) Genome track of aggregate scATAC-seq data around the EPCAM, IKZF1 & CDH5 genes across the identified single-cell clusters and the corresponding bulk ATAC-seq data from LUAD sample TCGA-86-A4D0 (bulk ATAC-seq). G) UMAP of identified tumor cells (without immune and stromal cells) based on scATAC-seq accessible chromatin regions. H) Genome track of aggregate scATAC-seq data around the HER2 locus. Tracks colored by breast cancer subtype (luminal = purple, basal-like = blue and HER2 = green). The maximum CNA signal from 80x WGS from each sample at this locus is plotted on the right. ND = not determined. I) UMAP of all filtered cells based on scATAC-seq accessible chromatin regions using the denoising autoencoder-based CNA correction. J) UMAP of identified immune and stromal cells based on scATAC-seq accessible chromatin regions. K-L) Comparison of immune and stromal cell type fractions across cancers. Colors indicate the immune/stromal cell type identified in Figure 1J (K) and the cancer type (L). M) Comparison of number of significantly (log2 fold-change > 1, FDR < 0.05 using two-sided t-test) upregulated (in red) and downregulated (in blue) scATAC-seq peaks in cancer immune cells compared to healthy tissue resident immune cells. N-O) Statistical significance (−log10(adjusted p-value), BH-adjusted hypergeometric test) of overlap enrichment of TF motifs in downregulated (N) and upregulated (O) scATAC-seq peaks in cancer immune cells relative to nearest tissue resident healthy immune cells from panel (M).
Figure 2:
Figure 2:. Deep learning models of pseudobulk scATAC-seq profiles identify cell-type resolved predictive TF motif syntax.
A) Schematic of the convolutional neural network trained to predict the cell-type resolved pseudobulk scATAC-seq probability over each 1,364bp accessible peak region underlying DNA sequences. Adapted from Ameen et al.. B) Performance evaluation of neural network sample-specific models, computed as the balanced AuROC on held out chromosomes (higher is better) across all peaks in each BRCA sample. C) Differential enrichments (Z score of odds ratios, Chi-Square Test Bonferroni corrected p values) of neural network model-derived predictive motif instances of TFs (rows) in neural network models trained on accessible peaks of different BRCA samples (columns). D) TF footprinting of the FOX/4 motif (neural network-cleaned Vierstra archetype motifs) in aggregated luminal, basal-like, and HER2 BRCA cancer samples. The Tn5 insertion bias track of FOX/4 motifs is shown below. E) Comparison of maximal footprint score of overlapping TF instances (Tn5 Bias-Divided normalized maximal footprint score) of neural network model-derived predictive motif instances (y-axis) vs. PWM-based motif instances (x-axis) in luminal, basal-like, and HER2 cancers. Predictive motif instances show higher motif foot printing strength. F) Top panel shows the genome tracks of aggregate pseudobulk of healthy breast tissue (from an unrelated donor), luminal, basal-like, and HER2 BRCA scATAC-seq samples around the TOP2A locus G) Per-base DeepLIFT contribution scores of FOXA1 and SP1 in the TOP2A promoter region highlighted in Figure 2F for healthy breast tissue and each BRCA sample. Right-most column shows the bulk RNA-seq expression level of FOXA1 and SP1 from each of the samples.
Figure 3:
Figure 3:. scATAC-seq identifies sub-clonal heterogeneity driven by copy number alterations in GBM.
A-B) UMAP of cells from GBM-45 (A) or GBM-39 (B) in the CNA subspace (normalized 10MB windows), colored by the CNA subclones. C-D) Heatmap of CNA signal computed from normalized 10MB scATAC bins on aggregated cells from CNA derived scATAC-seq clusters for GBM-45 (TCGA-06-A5U0) and GBM-39 (TCGA-4W-AA9S). Top row indicates the CNA signal aggregated in 10MB bins for the sample measured from matched WGS. E-F) UMAP of cells from GBM-45 (E) and GBM-39 (F) in the ATAC-seq subspace (501-bp peak regions), colored by the CNA subclones. G) UMAP of adult and fetal healthy brain cells, colored by the cell type. H-K) Projection of top 2 cancer subclones (cell count) identified from the GBM samples in Figure 3A–B into the Latent Semantic Index (LSI) subspace of the healthy adult and fetal brain scATAC-seq data. L) Comparison of sample type annotations of nearest neighbor healthy adult and fetal brain cells to the major CNA based subclones identified in Figure 3A–B. M-N) Enrichment of motifs (odds ratio, Chi-Square Test Bonferroni corrected p values) derived from the neural network models across the two sub clones in fetal like (Clone B in GBM-45 and Clone A in GBM-39) and adult-like sub clones (Clone A in GBM-45 and Clone B in GBM-39). Fetal like subclones motif enrichment in (M) and adult like motif enrichment in (N). O) Top panel shows the genome tracks of aggregate pseudobulk scATAC-seq signal of Clone A and Clone B from GBM-45 around the MYCL locus. Per-base DeepLIFT contribution scores are shown below at the highlighted region in the MYCL locus. The region is not identified as a peak in Clone A. The SOX4 gene body in chr6 is amplified only in Clone B. P) Enrichment of TFs from Chromosome 6 in Clone B predicted to bind near other TF genes compared to non-TF genes in Clone B (P= 8.522×10−35 OR = 1.3191, Fisher’s exact).
Figure 4:
Figure 4:. Cancer-specific chromatin accessibility signals nominate malignant regulatory programs not observed in healthy tissues.
A) UMAP of adult healthy breast cells, colored by the three epithelial cell types (top left) and projection of BRCA cancer samples into the LSI subspace of the healthy adult breast scATAC-seq data. B) Venn diagram representing the number of differentially upregulated (top) and down regulated (bottom) BRCA cancer subtype (luminal, basal-like and HER2) distal peaks compared to the healthy control samples. C) PCA of top differential peaks within breast cancer samples and their nearest neighbor controls. Triangles indicate control samples and circles indicate cancer samples. Colors indicate the BRCA subtype. D) Browser tracks around KRT17 (basal marker) and PI3 (secretory-type epithelial cell marker) genes indicating the differential accessibility of basal-like cancers compared to the healthy basal (myoepithelial) cell type, indicating a higher similarity with the healthy secretory-type epithelial cell chromatin accessibility. E) K-means clustering of differential peaks in BRCA cancers compared to their nearest neighbor healthy breast cells. F) Differential enrichments of neural network-derived predictive motif instances of TFs (z score of odds ratios, Chi-Square Test Bonferroni corrected p values) in differentially accessible peaks of different BRCA samples (columns). Motifs mentioned in the text are highlighted in red font for visibility. G-H) K-means Clustering of differential enhancers in BLCA (G) and LUAD (H) cancer samples compared to their nearest neighbor healthy cells from their respective tissue type.
Figure 5:
Figure 5:. Deep learning models of cell type-resolved chromatin accessibility prioritize non-coding germline and somatic mutations associated with cancer.
A) Enrichment of proportion of heritability from GWAS summary statistics of 3 different cohorts (BCAC, Finngen, and UKBB) attributed to scATAC-seq-derived cRE landscapes of healthy breast and brain tissue and in BRCA and GBM. B) Cumulative density distribution of bulk ATAC-seq allelic imbalance of LD-expanded germline SNPs from the 3 BRCA cohorts. The red curve indicates the mutations predicted by machine learning models to have Loss of Accessibility (LoA) above the 97.5th percentile and the blue curve indicates the non-prioritized SNPs from the BRCA neural network model. (p-value: 3.09 × 10−5, one-sided t-test). C) Schematic of the prioritization framework for the non-coding somatic mutations using cancer-specific neural network models and model-derived ISM scores D-E) Cumulative density distribution of scATAC-seq allelic imbalance of non-coding somatic mutations from 80X WGS from scATAC-seq samples. The red curve indicates the mutations predicted to have LoA (D) or GoA (E) (above the 97.5th LoA and below 2.5th for GoA) percentile and the blue curve indicates the non-prioritized SNPs from the BRCA neural network model. (p-value: 1.60 × 10−12 for LoA (D) and 5.82 × 10−9 for GoA (E), one-sided t-test). F) scATAC-seq and WGS allelic ratios observed for a candidate example of a non-coding somatic variant with high predicted effect size and high scATAC-seq allelic imbalance, identified in COAD-29 sample. G) Genome tracks of aggregate pseudobulk scATAC-seq signal across all COAD samples, around the candidate prioritized mutation with high scATAC-seq allelic imbalance in panel (F) in the COAD-29 sample. The changes in contribution scores highlight disruption of the active TF motif (ELF/ETV motif) due to the non-coding somatic mutation. H) Enrichment of somatic non-coding mutations in cancer relevant genes (tumor suppressor & oncogenes) in PCAWG samples from the 7 cancer types of interest) in cases vs. matched controls prioritized using different methods. LoA mutations prioritized by neural network models with ISM are enriched in cases vs. controls (OR = 1.5142, p-value = 0.0064, Fisher’s exact test). LoA + GoA mutations prioritized by neural network models with ISM are enriched in cases vs. controls (OR = 1.4348, p-value = 0.0027, Fisher’s exact test). I) Case study of a predicted LoA mutation from the PCAWG cohort (not from our ATAC-seq WGS cohort) near the TET2 tumor suppressor gene in BLCA. Panel shows the genome tracks of aggregate pseudobulk scATAC-seq signal across all the BLCA samples around the somatic mutation. The changes in contribution scores highlight disruption of an active AP1 motif due to a non-coding somatic mutation occurring in the PCAWG cohort. J) Case study of a predicted GoA mutation from PCAWG cohort (not from our ATAC-seq WGS cohort) near the MYCN oncogene in COAD cancer. Panel shows the genome tracks of aggregate pseudobulk scATAC-seq signal across all the COAD samples around the somatic mutation. The changes in contribution scores highlight the gain of active IRF motif due to the non-coding somatic mutation.

References

    1. Hanahan D & Weinberg RA Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011). - PubMed
    1. Hutter C & Zenklusen JC The Cancer Genome Atlas: Creating Lasting Value beyond Its Data. Cell 173, 283–285 (2018). - PubMed
    1. Akbani R et al. A pan-cancer proteomic perspective on The Cancer Genome Atlas. Nat. Commun. 5, 3887 (2014). - PMC - PubMed
    1. Noushmehr H et al. Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. Cancer Cell 17, 510–522 (2010). - PMC - PubMed
    1. Corces MR et al. The chromatin accessibility landscape of primary human cancers. Science (2018) doi: 10.1126/science.aav1898. - DOI - PMC - PubMed

Publication types