Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 12;15(1):7995.
doi: 10.1038/s41467-024-52356-9.

Context-aware single-cell multiomics approach identifies cell-type-specific lung cancer susceptibility genes

Affiliations

Context-aware single-cell multiomics approach identifies cell-type-specific lung cancer susceptibility genes

Erping Long et al. Nat Commun. .

Erratum in

Abstract

Genome-wide association studies (GWAS) identified over fifty loci associated with lung cancer risk. However, underlying mechanisms and target genes are largely unknown, as most risk-associated variants might regulate gene expression in a context-specific manner. Here, we generate a barcode-shared transcriptome and chromatin accessibility map of 117,911 human lung cells from age/sex-matched ever- and never-smokers to profile context-specific gene regulation. Identified candidate cis-regulatory elements (cCREs) are largely cell type-specific, with 37% detected in one cell type. Colocalization of lung cancer candidate causal variants (CCVs) with these cCREs combined with transcription factor footprinting prioritize the variants for 68% of the GWAS loci. CCV-colocalization and trait relevance score indicate that epithelial and immune cell categories, including rare cell types, contribute to lung cancer susceptibility the most. A multi-level cCRE-gene linking system identifies candidate susceptibility genes from 57% of the loci, where most loci display cell-category-specific target genes, suggesting context-specific susceptibility gene function.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Graphic summary of study design, workflow, and enrichment strategy.
Overview of the study pipeline, including tissue collection (A), single-cell sequencing (B), multiome profiling/analyses of chromatin accessibility/gene expression (C), and post-GWAS analyses of functional variants, target genes, and disease-relevant cells underlying lung cancer susceptibility loci (D). E Antibody markers of EpCAM, CD31, and CD45 were used to sort the live single-cells: EpCAM+CD45CD31 (“epithelial”), EpCAMCD45+CD31 (“immune”), EpCAMCD45CD31+ (“endothelial”), and EpCAMCD45CD31 (“stromal”). Epithelial cells were enriched by balancing the ratios among “epithelial”, “immune”, “endothelial”, and “stromal”. F Bar plot shows the nuclei fraction of “epithelial” (green), “immune” (blue), “endothelial” (red), and “stromal” (yellow) before and after enrichment. Each bar represents an individual sample following the order of sample IDs listed in Supplementary Data 1. Source data are provided as a Source Data file. A, B were created with BioRender.com released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license (https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en).
Fig. 2
Fig. 2. Identification of major cell types of the human lung via joint profile of snRNA-seq and snATAC-seq.
A Weighted Nearest Neighbor (WNN) clustering results of the 117,911 single-nucleus using the profiles of chromatin accessibility and gene expression after quality controls with cell-type annotation based on canonical markers, which resulted in 23 cell types across endothelial (red shades), stromal (yellow shades), epithelial (green shades), and immune (blue shades) cells. B Fraction of nuclei of each cell type relative to the total number of nuclei. Numbers next to each bar denote absolute counts out of 117,911 nuclei. C Dot plot visualizing the normalized RNA expression of selected marker genes by cell type. The color and size of each dot correspond to the scaled average expression level and fraction of expressing cells, respectively. Additional markers are shown in Fig. S1. D The dot blot on the left visualizes the normalized RNA expression of AT2 and AT2-proliferating cells (AT2-pro). The color and size of each dot correspond to the scaled average expression level and fraction of expressing cells, respectively, with the same scale and fraction defined in (C). The sequencing tracks on the left visualize chromatin accessibility signals around selected marker genes by cell type. Each track represents the aggregate snATAC signal of all three included cell types normalized by the total number of reads in the transcription start site (TSS) region. Arrows show the transcriptional directions of the genes. Coordinates for each region are as follows: STMN1 (chr1:25904993-25908229), TYMS (chr18:657083-658911), CDK1 (chr10:60768638-60777926), MKI67 (chr10:126261833-126268756). Yellow squares highlight the main differences of chromatin accessibility between two cell types. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Characterization of the cell-type specificity underlying lung-cancer-associated functional variants.
A Illustration of the candidate causal variants (CCVs) selection from GWAS loci. Each dot represents a variant, and the LD to lead variant is color-coded. Dots with dashed outline represent the variant in high-LD with the lead variant but not covered in the original GWAS summary statistics. LLR: log likelihood ratio (B) The confidence level of the CCV functionality is established by colocalizing them with accessible chromatin regions, predicting allelic TF binding effects, assessing TF abundance, and TF footprinting. C Piechart presents the fraction of CCVs assigned to different cell-type categories (inner pie) and assigned to a single cell type (external pie). D Pie chart presents the fraction of loci assigned to different categories.
Fig. 4
Fig. 4. Cell-type-specific variant function.
A Sequencing tracks representing chromatin accessibility near the CCVs marked by rsIDs and vertical pink lines. Each track represents the aggregated snATAC signal of each cell type, normalized by the total number of reads in the regions (y-axis scale 0–71). Arrow depicts the transcriptional direction of TERT. B Normalized transcriptional activity of 145 bp sequences encompassing rs7726159 (upper) and rs7725218 (lower) tested by massively parallel reporter assays in A549 lung cells. TPM: tags per million, alt: alternative allele (lung cancer risk-associated), ref: reference allele, fwd: forward, rev: reverse. Center lines show the medians; box limits indicate the 25th and 75th percentiles; whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles, outliers are represented by dots. Density is reflected in the width of the shape. Number of tags combined from five biological replicates for statistical testing are n = 125, 120, 120, and 110 for rs7726159 and n = 105, 110, 105, and 115 for rs7725218, from left to right. FDR values were calculated by the Wald test and corrected by the Benjamini-Hochberg procedure. C The sequencing tracks of chromatin accessibility, CCVs, and gene transcriptional direction using the same style as (A) (y-axis scale 0–490). D Position weight matrix of IRF8 motif shown as the height of the motif logos with the genomic location at the bottom. The variant position (rs3769823) within the motif is indicated with a red box. E The normalized mRNA expression of IRF8 across 7 cell types with the highest IRF8 expression. The color and size of each dot correspond to the scaled average expression level and fraction of cells expressing IRF8, respectively. F The upper part displays the average footprint profile of the IRF8 across all detected peaks in each cell type. Three cell types with the highest average footprint profiles for IRF8 motif are shown in shades of pink and the remaining cell types in gray. Tn5 insertion bias was corrected by subtracting the Tn5 signals from the average footprint signals. The lower part shows the expected Tn5 enrichment based on distance from motif. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Linkage between cCREs and genes and representative loci showing context-specific genetic regulation mechanisms.
A Illustration of the rationale in linking the cCREs to genes in different levels. cCRE module is represented by straight blue line. CCV-colocalizing cCREs are shown as pink cones. Highly co-accessible cCREs are represented by blue loop. The cCRE-gene correlations are represented by green loop. B The left panel presents the proportion of genes in different levels across GWAS loci. The right panel presents the number of genes in different levels across GWAS loci. The numbers refer to the level-6 gene numbers. Asterisk indicates genes identified by TWAS or colocalization from previous studies. C, D The rsIDs of the CCVs are presented in the upper part of the sequencing tracks of chromatin accessibility with their locations marked in the tracks using vertical lines. Each track represents the aggregate snATAC signal of each cell type normalized by the total number of reads in the TSS regions (normalized values for the left three ATAC tracks: 0–380; normalized values for the right four ATAC tracks: 0–240). Arrows show the transcriptional directions of the genes. Source data are provided as a Source Data file.

Update of

References

    1. Sung, H. et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin.71, 209–249 (2021). - PubMed
    1. Samet, J. M. et al. Lung cancer in never smokers: clinical epidemiology and environmental risk factors. Clin. Cancer Res.15, 5626–5645 (2009). - PMC - PubMed
    1. Mucci, L. A. et al. Familial risk and heritability of cancer among twins in Nordic countries. JAMA315, 68–76 (2016). - PMC - PubMed
    1. Dai, J. et al. Estimation of heritability for nine common cancers using data from genome-wide association studies in Chinese population. Int. J. Cancer140, 329–336 (2017). - PMC - PubMed
    1. Jiang, X. et al. Shared heritability and functional enrichment across six solid cancers. Nat. Commun.10, 431 (2019). - PMC - PubMed

Publication types

Associated data