Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Sep 26:2023.09.25.559336.
doi: 10.1101/2023.09.25.559336.

Context-aware single-cell multiome approach identified cell-type specific lung cancer susceptibility genes

Affiliations

Context-aware single-cell multiome approach identified cell-type specific lung cancer susceptibility genes

Erping Long et al. bioRxiv. .

Update in

Abstract

Genome-wide association studies (GWAS) identified over fifty loci associated with lung cancer risk. However, the genetic mechanisms and target genes underlying these loci are largely unknown, as most risk-associated-variants might regulate gene expression in a context-specific manner. Here, we generated a barcode-shared transcriptome and chromatin accessibility map of 117,911 human lung cells from age/sex-matched ever- and never-smokers to profile context-specific gene regulation. Accessible chromatin peak detection identified cell-type-specific candidate cis-regulatory elements (cCREs) from each lung cell type. Colocalization of lung cancer candidate causal variants (CCVs) with these cCREs prioritized the variants for 68% of the GWAS loci, a subset of which was also supported by transcription factor abundance and footprinting. cCRE colocalization and single-cell based trait relevance score nominated epithelial and immune cells as the main cell groups contributing to lung cancer susceptibility. Notably, cCREs of rare proliferating epithelial cell types, such as AT2-proliferating (0.13%) and basal cells (1.8%), overlapped with CCVs, including those in TERT. A multi-level cCRE-gene linking system identified candidate susceptibility genes from 57% of lung cancer loci, including those not detected in tissue- or cell-line-based approaches. cCRE-gene linkage uncovered that adjacent genes expressed in different cell types are correlated with distinct subsets of coinherited CCVs, including JAML and MPZL3 at the 11q23.3 locus. Our data revealed the cell types and contexts where the lung cancer susceptibility genes are functional.

PubMed Disclaimer

Conflict of interest statement

Declaration of Interests The authors declare no competing interests.

Figures

Figure 1.
Figure 1.. Graphic summary of study design, workflow, and enrichment strategy.
Overview of the study pipeline, including tissue collection (A), single-cell sequencing (B), multiome profiling/analyses of chromatin accessibility/gene expression (C), and post-GWAS analyses of functional variants, target genes, and disease-relevant cells underlying lung cancer susceptibility loci (D). (E) Antibody markers of EPCAM, CD31, and CD45 were used to sort the live single-cells: EPCAM+CD45CD31 (“epithelial”), EPCAMCD45+CD31 (“immune”), EPCAMCD45CD31+ (“endothelial”), and EPCAMCD45CD31 (“stromal”). Epithelial cells were enriched by balancing the ratios among “epithelial”, “immune”, “endothelial”, and “stromal”. (F) Bar plot shows the nuclei fraction of “epithelial” (green), “immune” (blue), “endothelial” (red), and “stromal” (yellow) before and after enrichment. Each bar represents an individual sample following the order of sample IDs listed in Table S1.
Figure 2.
Figure 2.. Identification of major cell types of the human lung via joint profile of snRNA-seq and snATAC-seq.
(A) Weighted Nearest Neighbor (WNN) clustering results of the 117,911 single-nucleus using the profiles of chromatin accessibility and gene expression after quality controls with cell-type annotation based on canonical markers, which resulted in 23 cell types across endothelial (red shades), stromal (yellow shades), epithelial (green shades), and immune (blue shades) cells. (B) Fraction of nuclei of each cell type relative to the total number of nuclei. Numbers next to each bar denote absolute counts out of 117,911 nuclei. (C) Dot plot visualizing the normalized RNA expression of selected marker genes by cell type. The color and size of each dot correspond to the scaled average expression level and fraction of expressing cells, respectively. Additional markers are shown in Figure S1. (D) The dot blot on the left visualizes the normalized RNA expression of AT2, club, and AT2-proliferating cells (AT2-pro) with the same style as panel C. The sequencing tracks on the left visualize chromatin accessibility signals around selected marker genes by cell type. Each track represents the aggregate snATAC signal of all three included cell types normalized by the total number of reads in the transcription start site (TSS) region. Arrows show the transcriptional directions of the genes. Coordinates for each region are as follows: STMN1 (chr1:25904993–25908229), TYMS (chr18:657083–658911), CDK1 (chr10:60768638–60777926), MKI67 (chr10:126261833–126268756). Yellow squares highlight the main differences of chromatin accessibility among three cell types.
Figure 3.
Figure 3.. Characterization of the cell-type specificity underlying lung-cancer-associated functional variants.
(A) Illustration of the candidate causal variants (CCVs) selection from GWAS loci. Each dot represents a variant, and the linkage disequilibrium (LD) to lead variant is color-coded. Dots with dashed outline represent the variant in high-LD with the lead variant but not covered in the original GWAS summary statistics. (B) The confidence level of the CCV functionality is established by colocalizing them with accessible chromatin regions, predicting allelic TF binding effects, assessing TF abundance, and TF footprinting. (C) Piechart presents the fraction of CCVs assigned to different cell-type categories (inner pie) and assigned to a single cell type (external pie). (D) Piechart presents the fraction of loci assigned to different categories. (E) The sequencing tracks representing chromatin accessibility are displayed. The rsIDs above the tracks are marked with vertical pink lines to indicate their positions. Each track represents the aggregated snATAC signal of all cell types, normalized by the total number of reads in the regions (y-axis scale 0–380). Arrow depicts the transcriptional direction of TERT. (F) The transcriptional activity of 145 bp sequences encompassing rs7726159 (upper) and rs7725218 (lower) were tested in A549 lung cells. The activity is presented as the RNA TPM/DNA TPM from massively parallel reporter assays (MPRA). Both alleles (alternative, alt or lung cancer risk-associated; reference, ref) are shown in forward (fwd) and reverse (rev) directions. Center lines show the medians; box limits indicate the 25th and 75th percentiles; whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles, outliers are represented by dots. Density is reflected in the width of the shape. FDR values were calculated by the Wald test and corrected by the Benjamini-Hochberg procedure. (G) The sequencing tracks of chromatin accessibility, CCVs, and gene transcriptional directions using the same style as the left part of panel E (y-axis scale 0–490). (H) The upper part displays the position weight matrix (PWM) as the height of the motif logos. The position of the variant (rs3769823) within the motif is indicated with a red box. The lower part shows the genomic region containing the IRF8 motif and allelic binding site of rs3769823. (I) The normalized mRNA expression of IRF8 is displayed as the dot plots across 7 cell types with the highest IRF8 expression. The color and size of each dot corresponds to the scaled average expression level and fraction of cells expressing IRF8, respectively. (J) The upper part displays the footprint analysis of the IRF8 across cell types. The top three cell types (color-coded) showing IRF8 footprint are dendritic, B, and monocytes. The remaining cell types are in gray. Footprints were corrected for Tn5 insertion bias by subtracting the Tn5 insertion signal from the footprinting signal. The lower part shows the expected Tn5 enrichment based on distance from motif.
Figure 4.
Figure 4.. Linkage between cCREs and genes and representative loci showing context-specific genetic regulation mechanisms.
(A) Illustration of the rationale in linking the cCREs to genes in different levels. cCRE module is represented by straight blue line. CCV-colocalizing cCREs are shown as pink cones. Highly co-accessible cCREs are represented by blue loop. The cCRE-gene correlations are represented by green loop. (B) The left panel presents the proportion of genes in different levels across GWAS loci. The right panel presents the number of genes in different levels across GWAS loci. The numbers refer to the level-6 gene numbers. *indicates genes identified by TWAS or colocalization from previous studies. (C-D) The rs IDs of the CCVs are presented in the upper part of the sequencing tracks of chromatin accessibility with their locations marked in the tracks using vertical lines. Each track represents the aggregate snATAC signal of all cell types normalized by the total number of reads in the TSS regions (normalized values for the left three ATAC tracks: 0–380; normalized values for the right four ATAC tracks: 0–240). Arrows show the transcriptional directions of the genes.

References

    1. Sung H., Ferlay J., Siegel R.L., Laversanne M., Soerjomataram I., Jemal A., and Bray F. (2021). Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71, 209–249. - PubMed
    1. Samet J.M., Avila-Tang E., Boffetta P., Hannan L.M., Olivo-Marston S., Thun M.J., and Rudin C.M. (2009). Lung cancer in never smokers: clinical epidemiology and environmental risk factors. Clin. Cancer Res. 15, 5626–5645. - PMC - PubMed
    1. Mucci L.A., Hjelmborg J.B., Harris J.R., Czene K., Havelick D.J., Scheike T., Graff R.E., Holst K., Möller S., Unger R.H., et al. (2016). Familial risk and heritability of cancer among twins in Nordic countries. JAMA 315, 68–76. - PMC - PubMed
    1. Dai J., Shen W., Wen W., Chang J., Wang T., Chen H., Jin G., Ma H., Wu C., Li L., et al. (2017). Estimation of heritability for nine common cancers using data from genome-wide association studies in Chinese population. Int. J. Cancer 140, 329–336. - PMC - PubMed
    1. Jiang X., Finucane H.K., Schumacher F.R., Schmit S.L., Tyrer J.P., Han Y., Michailidou K., Lesseur C., Kuchenbaecker K.B., Dennis J., et al. (2019). Shared heritability and functional enrichment across six solid cancers. Nat. Commun. 10, 431. - PMC - PubMed

Publication types