Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 11;111(7):1405-1419.
doi: 10.1016/j.ajhg.2024.05.021. Epub 2024 Jun 20.

High-throughput characterization of functional variants highlights heterogeneity and polygenicity underlying lung cancer susceptibility

Affiliations

High-throughput characterization of functional variants highlights heterogeneity and polygenicity underlying lung cancer susceptibility

Erping Long et al. Am J Hum Genet. .

Abstract

Genome-wide association studies (GWASs) have identified numerous lung cancer risk-associated loci. However, decoding molecular mechanisms of these associations is challenging since most of these genetic variants are non-protein-coding with unknown function. Here, we implemented massively parallel reporter assays (MPRAs) to simultaneously measure the allelic transcriptional activity of risk-associated variants. We tested 2,245 variants at 42 loci from 3 recent GWASs in East Asian and European populations in the context of two major lung cancer histological types and exposure to benzo(a)pyrene. This MPRA approach identified one or more variants (median 11 variants) with significant effects on transcriptional activity at 88% of GWAS loci. Multimodal integration of lung-specific epigenomic data demonstrated that 63% of the loci harbored multiple potentially functional variants in linkage disequilibrium. While 22% of the significant variants showed allelic effects in both A549 (adenocarcinoma) and H520 (squamous cell carcinoma) cell lines, a subset of the functional variants displayed a significant cell-type interaction. Transcription factor analyses nominated potential regulators of the functional variants, including those with cell-type-specific expression and those predicted to bind multiple potentially functional variants across the GWAS loci. Linking functional variants to target genes based on four complementary approaches identified candidate susceptibility genes, including those affecting lung cancer cell growth. CRISPR interference of the top functional variant at 20q13.33 validated variant-to-gene connections, including RTEL1, SOX18, and ARFRP1. Our data provide a comprehensive functional analysis of lung cancer GWAS loci and help elucidate the molecular basis of heterogeneity and polygenicity underlying lung cancer susceptibility.

Keywords: CRISPRi; GWAS follow up; context-specific regulation; functional variants; lung cancer; lung cancer susceptibility genes; massively parallel reporter assays; variant annotation.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

Figure 1
Figure 1
MPRA identified functional variants from most of the lung cancer GWAS loci (A) Schematic illustration of the overall workflow for MPRA design, experiments, and analyses. Three GWAS studies,, and 42 lung cancer risk-associated loci were included. A total of 2,245 candidate variants were selected based on log likelihood ratio (LLR) of GWAS p value and R2 (color-coded) relative to the lead variant. Oligo library was synthesized with 145 base sequences encompassing each variant (+/− 72 bases) with reference (ref) and alterative (alt) alleles in both forward (for) and reverse (rev) directions and 12-bp barcodes (25 tags per unique sequence). Test sequences were cloned in front of the minimal promoter (TATA) of the luciferase (luc2) construct and barcodes in front of the polyA signal. The library was transfected into A549 and H520 cells with or without BaP exposure to generate expressed RNA tags. Both input DNA and RNA libraries were sequenced to assess the tag counts. A regression model was used to assess the allelic effects on transcriptional activity by adjusting the effect of strand and transfection as covariates. (B and C) FDR values for allelic transcriptional activity of each variant measured by MPRA are displayed in Manhattan plots for A549 (B) and H520 (C) cells with (bottom) or without (top) BaP exposures. Horizontal lines represent an FDR cutoff of 0.01 (-log10(log10(FDR)) = 1). Variants displaying significant allelic transcriptional activity are colored in yellow shades for A549 and blue shades for H520 cells with darker shades for BaP exposure in each cell type. (D) The percentage of loci with (red) or without (purple) significant functional variants were shown. (E) The number of MPRA-tested variants (gray) and significant variants (blue) per tested locus is presented. Loci with no significant variants were marked with purple triangles. Lung cancer GWAS loci were ordered by chromosomes (defined in Table S1).
Figure 2
Figure 2
A functional scoring system prioritized single or multiple variants across GWAS loci (A) The features from MPRA (allelic effects and activator, red shades), epigenome (chromatin accessibility and histone markers, blue shades), and allelic TF binding prediction (purple shades) were incorporated to score the 844 significant variants from MPRA. The variants with total score 9 or higher were defined as high-confidence variants. (B) The upper left part shows the number of high-confidence variants (green) and rest of the variants with total score 1–8. The lower left part shows the number of loci including at least one high-confidence variant (green). The right part shows the percentage of loci with a single high-confidence variant (gray) or multiple variants (green). (C and D) Two representative loci with similarly large numbers of tested variants are showcased—one with a single high-confidence variant (C) and the other with multiple high-confidence variants (D). Each dot represents a variant with genomic coordinates on the x axis and log transformed GWAS p values on the y axis. The lower panels show zoomed-in views of red squared areas from the corresponding upper panels. Individual scores and LD R2 with the GWAS lead variants are shown for the high-confidence variants (green diamond) in green boxes.
Figure 3
Figure 3
Cell-type-specific allelic function of lung-cancer-associated variants (A) A regression model was used to assess cell-type-specific allelic effects on transcriptional activity, (allelecell type), while adjusting the effect of strand and transfection as covariates. A variant (rs4573350) with significant allele-cell-type interaction (FDR = 7.58 × 10−141) is presented. Each dot represents the normalized expression level of a unique tag measured in A549 (yellow) and H520 cells (blue). (B) FDR values for cell type specificity of each variant are displayed in Manhattan plots. Horizontal lines represent an FDR cutoff of 0.01 (-log10(log10(FDR)) = 1). For the significant variants, cell types showing a larger MPRA allelic effect (absolute log2FC) are color coded (A549: yellow and H520: blue). (C) Correlation of MPRA allelic effect sizes between A549 and H520 cell lines without exposures. Each dot represents the allelic effect size (log2FC) of each variant. The Pearson correlation coefficients r, T statistics, and p values were shown. (D and E) Similar Pearson correlation plots of allelic effect sizes between DMSO and BaP exposure conditions in A549 (D) and H520 (E) cell lines.
Figure 4
Figure 4
Functional variants are linked to target genes, including those affecting cell growth (A) A schematic overview of linking the significant functional variants to candidate target genes. Four complementing datasets based on lung cell or tissue were integrated (eQTL of GTEx v8 normal lung, meQTL of normal lung, ABC model enhancer-promoter interaction of lung cell lines/tissues, and cell-type-specific enhancer-gene expression correlation in lung single-cell ATAC/RNA-seq data14) to identify 206 genes. (B) The overall distribution of the CRISPR KO gene effect scores (y axis) from DepMap presented separately for A549 and H520 cells. Each dot represents a unique gene ordered by effect score (x axis). The lower outlier limit is defined by Q1 – 3 × IQ, where Q1 is the 25th percentiles and the interquartile (IQ) range is 75th percentiles – Q1. MPRA-prioritized genes are shown in blue and those with effect scores lower than the outlier limits in red. The rest are shown in gray. Gene names are shown for those passing the lower outlier limit (red dots) ordered by their effect score from lowest to highest.
Figure 5
Figure 5
Targeting high-confidence variants through CRISPRi in A549 cells (A) Variant prioritization at locus 20q13.33 nominated a single high-confidence variant, rs4809324. Each dot represents a variant with genomic coordinates shown on the x axis and log transformed GWAS p values on the y axis. Individual scores and LD R2 with the GWAS lead variant are shown for the high-confidence variants (green diamond). (B) UCSC protein-coding genes at 20q13.33 are displayed based on GENCODE v44. Gene symbols are presented if they were prioritized by variant-to-gene approaches (blue for eQTL/meQTL, green for single-cell ATAC-RNA, and orange for the ABC model). (C) CRISPRi mediated targeting of the genomic region encompassing rs4809324 with three unique gRNAs (G1, G2 and G3). AAVS1: a control gRNA targeting AAVS1 human safe harbor region. (D–G) Each plot shows GAPDH-normalized mRNA levels of RTEL1 (D), SOX18 (E), ZBTB46 (F), and ARFRP1 (G) from six biological replicates from three independent experiments (total n = 18, except in SOX18: AAVS n = 17, G1 n = 15, G3 n = 17). Boxes and whiskers denote mean ± SEM. Each dot represents fold change over the level from the non-targeting gRNA control averaged from qPCR triplicates. p values were calculated using a two-tailed Mann Whitney U test for the difference from the baseline set by non-targeting control (dotted red lines).

References

    1. Sung H., Ferlay J., Siegel R.L., Laversanne M., Soerjomataram I., Jemal A., Bray F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA. Cancer J. Clin. 2021;71:209–249. - PubMed
    1. Samet J.M., Avila-Tang E., Boffetta P., Hannan L.M., Olivo-Marston S., Thun M.J., Rudin C.M. Lung cancer in never smokers: clinical epidemiology and environmental risk factors. Clin. Cancer Res. 2009;15:5626–5645. - PMC - PubMed
    1. Mucci L.A., Hjelmborg J.B., Harris J.R., Czene K., Havelick D.J., Scheike T., Graff R.E., Holst K., Möller S., Unger R.H., et al. Familial Risk and Heritability of Cancer Among Twins in Nordic Countries. JAMA. 2016;315:68–76. - PMC - PubMed
    1. Dai J., Shen W., Wen W., Chang J., Wang T., Chen H., Jin G., Ma H., Wu C., Li L., et al. Estimation of heritability for nine common cancers using data from genome-wide association studies in Chinese population. Int. J. Cancer. 2017;140:329–336. - PMC - PubMed
    1. McKay J.D., Hung R.J., Han Y., Zong X., Carreras-Torres R., Christiani D.C., Caporaso N.E., Johansson M., Xiao X., Li Y., et al. Large-scale association analysis identifies new lung cancer susceptibility loci and heterogeneity in genetic susceptibility across histological subtypes. Nat. Genet. 2017;49:1126–1132. - PMC - PubMed

Publication types