Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 7;110(9):1574-1589.
doi: 10.1016/j.ajhg.2023.07.008. Epub 2023 Aug 9.

Integrative splicing-quantitative-trait-locus analysis reveals risk loci for non-small-cell lung cancer

Affiliations

Integrative splicing-quantitative-trait-locus analysis reveals risk loci for non-small-cell lung cancer

Yuzhuo Wang et al. Am J Hum Genet. .

Abstract

Splicing quantitative trait loci (sQTLs) have been demonstrated to contribute to disease etiology by affecting alternative splicing. However, the role of sQTLs in the development of non-small-cell lung cancer (NSCLC) remains unknown. Thus, we performed a genome-wide sQTL study to identify genetic variants that affect alternative splicing in lung tissues from 116 individuals of Chinese ancestry, which resulted in the identification of 1,385 sQTL-harboring genes (sGenes) containing 378,210 significant variant-intron pairs. A comprehensive characterization of these sQTLs showed that they were enriched in actively transcribed regions, genetic regulatory elements, and splicing-factor-binding sites. Moreover, sQTLs were largely distinct from expression quantitative trait loci (eQTLs) and showed significant enrichment in potential risk loci of NSCLC. We also integrated sQTLs into NSCLC GWAS datasets (13,327 affected individuals and 13,328 control individuals) by using splice-transcriptome-wide association study (spTWAS) and identified alternative splicing events in 19 genes that were significantly associated with NSCLC risk. By using functional annotation and experiments, we confirmed an sQTL variant, rs35861926, that reduced the risk of lung adenocarcinoma (rs35861926-T, OR = 0.88, 95% confidence interval [CI]: 0.82-0.93, p = 1.87 × 10-5) by promoting FARP1 exon 20 skipping to downregulate the expression level of the long transcript FARP1-011. Transcript FARP1-011 promoted the migration and proliferation of lung adenocarcinoma cells. Overall, our study provided informative lung sQTL resources and insights into the molecular mechanisms linking sQTL variants to NSCLC risk.

Keywords: FARP1 exon 20 skipping; non-small-cell lung cancer; risk loci; splice-transcriptome-wide association study; splicing quantitative trait locus.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Identification and characterization of sQTLs, comparison of sQTLs and eQTLs, and enrichment of NSCLC GWAS variants in lung sQTLs (A) The discoveries from splicing quantitative trait locus (sQTL) analysis based on 116 human lung tissues from the Nanjing Lung Cancer Cohort (NJLCC) study. For each gene, variant-intron pairs with a p value below the gene-level nominal p value threshold were considered significant, and the corresponding introns were called sQTL harboring introns (sIntrons). (B) Position of sQTL variants in relation to the splice junction. (C) Percentage (%) of sSNPs (the most significant variant per sIntron) located in or outside the corresponding gene. (D) Venn diagram showing the overlap of lung sGenes between the NJLCC and the Genotype-Tissue Expression (GTEx) project. (E) Left, p value distribution of NJLCC sSNPs in GTEx lung tissues. Right, the direction of effect is consistent for the majority (96.3%) of NJLCC sSNPs in GTEx lung tissues. (F) Enrichments of sSNPs in functional annotations. The height of the bars represents the fold change of the observed number of sSNPs to the expected number of variants that are not sSNPs overlapping a given annotation (see methods): 15 chromatin states (green, FDR < 0.05; gray, FDR ≥ 0.05); histone modifications (orange, FDR < 0.05); RNA-binding protein (RBP) eCLIP peaks (violet blue, FDR < 0.05). (G) Venn diagram showing the overlap of sQTL-harboring genes (sGenes) and expression quantitative trait locus (eQTL)-harboring gene (eGenes) in NJLCC lung tissues. (H and I) Distributions of the distance in base pairs (H) and linkage disequilibrium r2 (I) between the lead eQTL (the most significant eQTL variant per eGene) and sQTL (the most significant sQTL variant per gene) for genes harboring both. (J) Enrichment of non-small-cell lung cancer (NSCLC) GWAS SNPs (GWAS p value < 10−4) among sQTL variants. The GWASs of NSCLC and its histological subtypes were conducted in Chinese populations. The points indicate enrichment log-odds ratios. The bars represent 95% confidence intervals (95% CIs). 1 × 10−6p value < 0.005, ∗∗∗p value < 1 × 10−6.
Figure 2
Figure 2
Manhattan plot for splice-transcriptome-wide association study of non-small-cell lung cancer (A–C) Manhattan plot shows −log10(p value) for associations of intron usage ratios with risk of (A) NSCLC; (B) lung adenocarcinoma; and (C) lung squamous cell carcinoma. The x axis represents chromosomal location, and the y axis represents −log10(p value). The red horizontal line denotes FDR < 0.05 (A and B). For lung squamous cell carcinoma, the red horizontal line at p value = 1 × 10−5 indicated that there was no significant alternative splicing event with FDR < 0.05 (C). spTWAS associations in FARP1 and EIF3E (red) were highly likely to colocalize (posterior probability for hypothesis 4 [PP4] > 0.7). The GWASs consisted of 13,327 NSCLC-affected individuals (including 8,762 individuals with lung adenocarcinoma and 3,860 individuals with lung squamous cell carcinoma) and 13,328 control individuals.
Figure 3
Figure 3
spTWAS associations at EIF3E implicates a target gene independent of genetic effects on total expression (A–C) Manhattan plots of SNP-phenotype association before (gray) and after (blue) conditioning on the effect of cis-regulated intron splicing (EIF3E chr8: 109,245,901–109,247,227) or the top QTL: GWAS of lung adenocarcinoma (8,762 affected individuals and 13,328 control individuals) (A), sQTL (116 participants) (B), and eQTL (116 participants) (C). Two-sided p value was derived from the GWAS summary data (A) or calculated via linear regression (B and C). (D) A gene-level view of EIF3E highlighting (dashed lines) the intron cluster of the lung adenocarcinoma-associated introns (EIF3E chr8: 109,245,901–109,247,227 and EIF3E chr8: 109,241,424–109,247,227) and EIF3E transcripts in this region. Transcripts with median expression level > 0.1 transcripts per million (TPM) were shown. Protein-coding domain mappings are shown in rectangles. (E) Differential intron usage ratios of EIF3E chr8: 109,245,901–109,247,227 and EIF3E chr8: 109,241,424–109,247,227 stratified by rs677031 genotypes. (F and G) Boxplots of intron usage (EIF3E chr8: 109,245,901–109,247,227) (F) and overall expression of EIF3E (G), stratified by rs677031 genotypes. The thick line represents the median, the box represents the interquartile range (IQR), and the whiskers are the quartiles ± 1.5 × IQR. (H) Scatterplots of normalized intron usage (EIF3E chr8: 109,245,901–109,247,227) and expression of transcript EIF3E-011 in 116 participants. Correlation between them was evaluated with Spearman’s correlation test. (I) Boxplots for expression of transcript EIF3E-011, stratified by rs677031 genotypes. The thick line represents the median, the box represents the IQR, and the whiskers are the quartiles ± 1.5 × IQR.
Figure 4
Figure 4
spTWAS association at FARP1 implicates a target gene independent of genetic effects on total expression (A–C) Manhattan plots of SNP-phenotype association before (gray) and after (blue) conditioning on the effect of cis-regulated intron splicing (FARP1 chr13: 99,090,112–99,091,058) or the top QTL: GWAS of lung adenocarcinoma (8,762 affected individuals and 13,328 control individuals) (A), sQTL (116 participants) (B), and eQTL (116 participants) (C). Two-sided p value was derived from the GWAS summary data (A) or calculated with linear regression (B and C). (D) A gene-level view of FARP1 highlighting (dashed lines) the intron cluster of the lung adenocarcinoma-associated intron (FARP1 chr13: 99,090,112–99,091,058), as well as the sQTL variant and FARP1 transcripts in this region. Transcripts with median expression level > 0.1 TPM were shown. Protein-coding domain mappings are shown in rectangles. (E) Differential intron usage ratio of FARP1 chr13: 99,090,112–99,091,058 stratified by rs35861926 genotypes. (F) Boxplots of intron usage (FARP1 chr13: 99,090,112–99,091,058), stratified by rs35861926 genotypes. The thick line represents the median, the box represents the IQR, and the whiskers are the quartiles ± 1.5 × IQR. (G) Scatterplots of normalized intron usage (FARP1 chr13: 99,090,112–99,091,058) and expression of transcript FARP1-011 in 116 participants. Correlation between them was evaluated with Spearman’s correlation test. (H) Boxplots for expression of transcript FARP1-011, stratified by rs35861926 genotypes. The thick line represents the median, the box represents the IQR, and the whiskers are the quartiles ± 1.5 × IQR. (I) Boxplots of overall expression of FARP1, stratified by rs35861926 genotypes. The thick line represents the median, the box represents the IQR, and the whiskers are the quartiles ± 1.5 × IQR.
Figure 5
Figure 5
The T allele of rs35861926 promotes alternative splicing of FARP1 exon 20 in lung adenocarcinoma (A) FARP1 minigene vectors containing genome sequence of exon 19 through exon 21 surrounding the rs35861926 G or T allele were subcloned into pSPL3 vector. (B) Minigene assays in A549 and PC9 cells were conducted to confirm the effects of rs35861926 on expression levels of FARP1-011 (long transcript). The experiments were independently replicated three times.
Figure 6
Figure 6
The long transcript of FARP1 promotes lung adenocarcinoma cell proliferation and migration (A) The effect of FARP1-011 (long transcript) and FARP1-001 (short transcript) overexpression on the viability of A549 and PC9 cells. Results are shown as mean ± standard deviation (SD) from six independent experiments. Statistical significance was determined by Student’s two-sided t test, p < 0.05, ∗∗p < 0.01. (B–D) The effect of FARP1-011 and FARP1-001 transcript overexpression on colony formation abilities (B), proliferation abilities (C), and migration abilities (D) of A549 and PC9 cells. Results are shown as mean ± SD from three independent experiments. Statistical significance was determined by Student’s two-sided t test, p value < 0.05, ∗∗p value < 0.01.

Similar articles

Cited by

References

    1. Sung H., Ferlay J., Siegel R.L., Laversanne M., Soerjomataram I., Jemal A., Bray F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA. Cancer J. Clin. 2021;71:209–249. doi: 10.3322/caac.21660. - DOI - PubMed
    1. Bender E. Epidemiology: The dominant malignancy. Nature. 2014;513:S2–S3. doi: 10.1038/513S2a. - DOI - PubMed
    1. Chen Z.M., Peto R., Iona A., Guo Y., Chen Y.P., Bian Z., Yang L., Zhang W.Y., Lu F., Chen J.S., et al. Emerging tobacco-related cancer risks in China: A nationwide, prospective study of 0.5 million adults. Cancer. 2015;121:3097–3106. doi: 10.1002/cncr.29560. - DOI - PMC - PubMed
    1. Bossé Y., Amos C.I. A Decade of GWAS Results in Lung Cancer. Cancer Epidemiol. Biomarkers Prev. 2018;27:363–379. doi: 10.1158/1055-9965.EPI-16-0794. - DOI - PMC - PubMed
    1. Dai J., Lv J., Zhu M., Wang Y., Qin N., Ma H., He Y.Q., Zhang R., Tan W., Fan J., et al. Identification of risk loci and a polygenic risk score for lung cancer: a large-scale prospective cohort study in Chinese populations. Lancet Respir. Med. 2019;7:881–891. doi: 10.1016/S2213-2600(19)30144-4. - DOI - PMC - PubMed

Publication types

MeSH terms