Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct;56(10):2104-2111.
doi: 10.1038/s41588-024-01900-w. Epub 2024 Sep 16.

Systematic prioritization of functional variants and effector genes underlying colorectal cancer risk

Affiliations

Systematic prioritization of functional variants and effector genes underlying colorectal cancer risk

Philip J Law et al. Nat Genet. 2024 Oct.

Abstract

Genome-wide association studies of colorectal cancer (CRC) have identified 170 autosomal risk loci. However, for most of these, the functional variants and their target genes are unknown. Here, we perform statistical fine-mapping incorporating tissue-specific epigenetic annotations and massively parallel reporter assays to systematically prioritize functional variants for each CRC risk locus. We identify plausible causal variants for the 170 risk loci, with a single variant for 40. We link these variants to 208 target genes by analyzing colon-specific quantitative trait loci and implementing the activity-by-contact model, which integrates epigenomic features and Micro-C data, to predict enhancer-gene connections. By deciphering CRC risk loci, we identify direct links between risk variants and target genes, providing further insight into the molecular basis of CRC susceptibility and highlighting potential pharmaceutical targets for prevention and treatment.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the study.
Using data from GWASs for CRC, we identified 170 regions of interest. Data from MPRAs, epigenetic marks (ChIP–seq), chromatin accessibility (ATAC-seq), gene expression (RNA-seq) and long-range chromatin interactions (Micro-C) were combined to derive an integrative score to prioritize the functional variants at each CRC risk locus. These variants were linked to target genes by analyzing colon-specific eQTLs and using SMR. In the GWAS plot, the coloured dots indicate the variants that are above the P value threshold. In the SMR plot, they represent the two different datasets (GWAS and eQTL). The coloured portions of DNA represent the genomic regions of interest that were studied.
Fig. 2
Fig. 2. Distribution of annotation scores for each GWAS locus.
Scores were calculated as the sum of the annotations for each variant. Loci are labeled with the cytoband and the top GWAS SNPs in each region. The variants with scores in the top 20% were designated as Tier 1 variants, those with scores in the bottom 50% as Tier 3 and the remainder as Tier 2. Source data
Fig. 3
Fig. 3. Plot of the annotation sources for each of the variants analyzed in each GWAS locus.
a, At the 8q24.21 locus, the GWAS identified rs6983267, rs7013278 and rs4733767, highlighted in red, as risk loci. rs6983267 and rs7013278 are within 1.5 kb of each other, but rs6983267 is better annotated, with strong hits for MPRAs, transcription factor binding, open chromatin (ATAC-seq) and Micro-C. rs4733767 is over 150 kb away from rs6983267 and rs7013278 and has separate annotations, so it is probably a true independent locus. b, At the 10p12.1 locus, rs1773860 was the lead GWAS variant at this locus, but rs1248418 (r2 = 0.91, D′ = 0.98) was better annotated. This variant is located in open chromatin and is predicted to be in an enhancer region. In addition, this variant showed a long-range interaction with the TSS of BAMBI. c, Functional annotation of rs61776719 at the 1p34.3 locus identified rs67631072 (r2 = 1.0, D′ = 1.0) as the top annotated variant, which shows enhancer activity in open chromatin regions and is predicted by the ABC model to affect gene expression. Detailed figures of the annotations of the regions are shown in Extended Data Figs. 2–4. In all figure panels, gray blocks correspond to an annotation, and black blocks correspond to a strong annotation. ATAC denotes the presence of an ATAC-seq peak, CTCF denotes the presence of a CTCF peak from the ChIP–seq analysis and Akita denotes evidence of disruption of 3D chromatin structure. TF denotes that a transcription factor was predicted to bind. Source data
Fig. 4
Fig. 4. Treemap of the candidate target genes, which are grouped by functional category.
Genes that were identified in the integrated analysis were classified according to their biological or cellular function. The size of the box is proportional to the number of genes in the category. Source data
Extended Data Fig. 1
Extended Data Fig. 1. Detailed schematic of the analysis.
Detailed schematic of the analysis performed. Using the loci identified by the CRC GWAS, we annotated the regions using multiple functional modalities including massively parallel reporter assays (MPRA) to observe allelic effects on transcription, epigenetic marks (ChIP-seq), chromatin accessibility (ATAC-seq), gene expression (RNA-seq) and long-range chromatin interactions (Micro-C). ABC: Activity By Contact.
Extended Data Fig. 2
Extended Data Fig. 2. Detailed annotation for the variants in 8q24 locus.
Detailed functional annotation for the variants in 8q24 locus from UCSC Genome Browser, showing the Micro-C, chromHMM, ATAC-seq, and ChIP-seq data across the various cell lines. The putative variant, rs6983267, is highlighted in light blue (left). A secondary signal at rs4733767 is also shown (middle blue line).
Extended Data Fig. 3
Extended Data Fig. 3. Detailed annotation for the variants in 10p12 locus.
Detailed annotation for the variants in 10p12 locus from UCSC Genome Browser. The putative variant, rs1248418, is highlighted in light blue.
Extended Data Fig. 4
Extended Data Fig. 4. Detailed annotation for the variants in 1p34 locus.
Detailed annotation for the variants in 1p34 locus from UCSC Genome Browser. The putative variant, rs67631072, is highlighted in light blue.

References

    1. Lichtenstein, P. et al. Environmental and heritable factors in the causation of cancer–analyses of cohorts of twins from Sweden, Denmark, and Finland. N. Engl. J. Med.343, 78–85 (2000). - PubMed
    1. Fernandez-Rozadilla, C. et al. Deciphering colorectal cancer genetics through multi-omic analysis of 100,204 cases and 154,587 controls of European and east Asian ancestries. Nat. Genet.55, 89–99 (2023). - PMC - PubMed
    1. Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet.19, 491–504 (2018). - PMC - PubMed
    1. Tewhey, R. et al. Direct identification of hundreds of expression-modulating variants using a multiplexed reporter assay. Cell165, 1519–1529 (2016). - PMC - PubMed
    1. Abell, N. S. et al. Multiple causal variants underlie genetic associations in humans. Science375, 1247–1254 (2022). - PMC - PubMed

LinkOut - more resources