Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Mar 30:2023.03.29.534582.
doi: 10.1101/2023.03.29.534582.

Colocalization of blood cell traits GWAS associations and variation in PU.1 genomic occupancy prioritizes causal noncoding regulatory variants

Affiliations

Colocalization of blood cell traits GWAS associations and variation in PU.1 genomic occupancy prioritizes causal noncoding regulatory variants

Raehoon Jeong et al. bioRxiv. .

Update in

Abstract

Genome-wide association studies (GWAS) have uncovered numerous trait-associated loci across the human genome, most of which are located in noncoding regions, making interpretations difficult. Moreover, causal variants are hard to statistically fine-map at many loci because of widespread linkage disequilibrium. To address this challenge, we present a strategy utilizing transcription factor (TF) binding quantitative trait loci (bQTLs) for colocalization analysis to identify trait associations likely mediated by TF occupancy variation and to pinpoint likely causal variants using motif scores. We applied this approach to PU.1 bQTLs in lymphoblastoid cell lines and blood cell traits GWAS data. Colocalization analysis revealed 69 blood cell trait GWAS loci putatively driven by PU.1 occupancy variation. We nominate PU.1 motif-altering variants as the likely shared causal variants at 51 loci. Such integration of TF bQTL data with other GWAS data may reveal transcriptional regulatory mechanisms and causal noncoding variants underlying additional complex traits.

PubMed Disclaimer

Conflict of interest statement

Ethics Declarations

The authors declare no competing interests.

Figures

Extended Data Fig. 1 |
Extended Data Fig. 1 |. Properties of PU.1 binding sites and bQTLs.
(a) Position of PU.1 motifs at PU.1 binding sites. The bp distance is measured from the center of a 200 bp PU.1 ChIP-seq peak. (b) 12-mers with the highest (top 15) gkm-SVM weights aligned to PU.1 motif and PU.1:IRF composite motif. (c) Lack of enrichment in PU.1 bQTL lead variants tagging (LD r2>0.8) type 2 diabetes (T2D) and height GWAS associations. The histogram shows the number of variants tagging GWAS associations for each of 250 sets of null variants. The red lines indicate the number of PU.1 bQTL lead variants tagging GWAS associations.
Extended Data Fig. 2 |
Extended Data Fig. 2 |. Examples of variants affecting PU.1 binding.
(a) Examples of PU.1 motif-altering variants. Categorization of the variants correspond to Fig. 2b. EUR: European ancestry population in the 1000 Genomes Project. (b) Comparison of changes in motif score (Δ gkm-SVM) and estimated bQTL effect sizes of PU.1 motif-altering variants (SNPs and indels) at 49 colocalized loci. (c) An example of a copy number variation (esv3619112) affecting a PU.1 binding site.
Extended Data Fig. 3 |
Extended Data Fig. 3 |. Colocalization of PU.1 bQTL and lymphocyte count association signals at ZNF608 locus.
(a) Merged association plot for PU.1 bQTL and lymphocyte count association signals. Points are colored by LD r2 with respect to rs12517864, which is labeled with a purple diamond. (b) Z scores of rs12517864 for lymphocyte count and PU.1 bQTL association.
Extended Data Fig. 4 |
Extended Data Fig. 4 |. Effects of PU.1 motif-altering deletion rs5827412.
(a) GWAS effect size estimates for rs5827412 on 5 blood cell traits. The error bars indicate 95% confidence interval. Abbreviations of blood cell traits are described in Supplementary Table 2. (b-c) Boxplots are formatted as in Fig 4. (b) Regulatory QTL effects of rs5827412. (top) Genome tracks show PU.1 ChIP-seq, ATAC-seq, and H3K4me1 and H3K27ac ChIP-seq data from LCLs, respectively. (bottom) 4 phenotype values in read per million for each genome track and reads per kilobase million for LRRC25 expression levels. Allele dosage corresponds to the deletion allele. (c) LRRC25 expression level across 13 blood cell types. Monocyte is colored red. Cell types abbreviated as in Supplementary Fig. 1.
Extended Data Fig. 5 |
Extended Data Fig. 5 |. Colocalization of PU.1 bQTL and multiple sclerosis association signals at ZC2HC1A locus.
(a,c) Points are colored by LD r2 in the 1000 Genomes Project European population, with respect to rs3808619, which is labeled with a purple diamond. (a) Merged association plot for PU.1 bQTL and lymphocyte count association signals. (b) Z scores of rs3808619 for PU.1 bQTL and 5 blood cell traits association. (c) Merged association plot for PU.1 bQTL and multiple sclerosis (MS) association signals. (d) Z scores of rs3808619 for MS and PU.1 bQTL association.
Fig. 1 |
Fig. 1 |. Relevance of PU.1 bQTLs in LCLs to blood cell trait associations.
(a) (Left) Blood cell trait-associated loci may have overlapping PU.1 bQTLs and, potentially, expression QTL (eQTL) associations. (Right) Significant colocalization suggests that the causal variants are shared. If there is a PU.1 motif-altering variant at a colocalized PU.1 bQTL, the variant is likely to be the shared causal variant. (b) Comparison of changes in motif score (Δ gkm-SVM) and estimated bQTL effect sizes at PU.1 motif-altering variants within 200bp PU.1 ChIP-seq peaks. The color represents the −log10(p) of PU.1 bQTL association (linear regression). (c) Number of significant PU.1 bQTLs with PU.1 motif-altering variants at each region within the 200bp PU.1 ChIP-seq peaks. ***: p<2.2×106 (Fisher’s exact test).
Fig. 2 |
Fig. 2 |. Colocalization of blood cell traits GWAS and PU.1 bQTLs.
(a) Enrichment of PU.1 bQTLs for associations to specific blood cell traits. Traits with empirical adjusted p<0.05 (above the dashed line) are labeled. Abbreviations of blood cell traits are described in Supplementary Table 3. (b) Colocalization results from JLIM and Coloc. Each point is a PU.1 bQTL - Trait pair. The number shown in each quadrant is the number of points within the significance category. Dashed lines indicate the respective significance thresholds (JLIM: p<0.01172 (FDR 5%), Coloc: PP(colocalized) > 0.5). (c) The types of putative causal variants at colocalized PU.1 bQTLs that alter PU.1 motifs or the copy number of the PU.1 occupancy site. SNPs, indels, and multi-variants alter PU.1 motifs. CNV: copy number variation altering copy number of PU.1 binding sites; Multi: multiple variants in perfect LD (r2=1) within a PU.1 motif sequence; Unk (Unknown): No variant altering PU.1 motif sequence or its copy number. (d) Number of PU.1 motif-altering SNPs at each nucleotide position at colocalized PU.1 binding sites. Motif logos are from Homer database. (e) Blood cell trait GWAS credible set size at loci with colocalized PU.1 bQTLs and a PU.1 motif-altering variant. Only 25 loci with fine-mapping result in Vuckovic et al. 2020 are represented.
Fig. 3 |
Fig. 3 |. Distribution of colocalized loci across the genome.
(a) Proportion of tested loci with significant colocalization. The colors represent the trait groups. The blood cell traits highlighted in yellow correspond to white blood cell traits. Abbreviations of blood cell traits are described in Supplementary Table 3. (b) Fuji plot depicting the genomic distribution of blood cell trait-associated loci that show high-confidence colocalization with PU.1 bQTLs. The colors are as in panel a. (c) The stacked bar plot at the center shows the number of traits each PU.1 bQTL colocalizes with.
Fig. 4 |
Fig. 4 |. PU.1 motif alteration pinpoints a lymphocyte count-associated variant that is a secondary ZNF608 eQTL variant.
(a, c-e, g) PU.1 motif-altering variant rs12517864 is shown as a purple diamond, and the ZNF608 eQTL lead variant rs2028854 is shown as a yellow diamond. Vertical dashed lines mark the position of these two variants. Unless noted otherwise, points are colored by LD r2 with respect to rs12517864. (a) PU.1 bQTL and lymphocyte count association signals. (b) The effect of rs2028854 on the sequence with respect to the PU.1 binding motif. (c) (Top) Primary ZNF608 eQTL signals in LCLs. LD r2 is calculated with respect to rs2028854, the lead variant. (Bottom) ZNF608 eQTL signals in LCLs conditioned on the rs2028854 dosage. (d) Fine-mapping result of ZNF608 eQTL signals in LCLs, using SuSiE. Points are colored by the credible set they belong to. PIP: Posterior inclusion probability. (e) ZNF608 eQTL association signals in naive B cells (DICE). (f) Genome tracks of PU.1 ChIP-seq, ATAC-seq, H3K4me1 and H3K27ac ChIP-seq assayed in GM12878. (g) Gene track showing ZNF608 and the two variants. The weights of the red curves indicate the CHiCAGO scores calculated in Javierre et al. 2016. (h-i) On top of the box plots, all the data points are shown. (h) The effect of rs12517864 dosage on various molecular phenotypes shown in panel f. For PU.1 ChIP-seq data, there weren’t any individuals with homozygous alternate allele (AA). (i) ZNF608 expression levels (count per million) through lymphocyte differentiation and across various lymphocyte types. HSC: hematopoietic stem cell, MPP: multipotent progenitor, LMPP: lymphoid-primed multipotent progenitor, CLP: common lymphoid progenitor, B: B cell, CD4T: CD4+ T cell, CD8T: CD8+ T cell, NK: natural killer cell.
Fig. 5 |
Fig. 5 |. PU.1 motif-altering deletion rs5827412 at LRRC25 locus associated with lower monocyte counts.
(a) PU.1 bQTL and monocyte percentage association signals colocalize. (b) The effect of rs5827412 on the PU.1 motif. (c) Reduced reporter activity by rs5827412 in log2 fold change. Error bars indicate 95% confidence intervals. *: adjusted p<0.05. (d-e) Boxplots are formatted as in Fig 4. (d) A boxplot showing PU.1-dependent reduction in chromatin accessibility levels (count per million) at the regulatory element surrounding rs5827412 in control pro-B cell lines (SPI1+/+) and counterparts with SPI1 knocked out (SPI1−/−). Regions highlighted in yellow marks the accessible region corresponding to the boxplot. n=3 for each condition. *: DESeq2 adjusted p<0.05. (e) A boxplot showing LRRC25 expression levels (count per million) through monocyte differentiation. HSC: hematopoietic stem cell, MPP: multipotent progenitor, CMP: common myeloid progenitor, GMP: granulocyte-macrophage progenitor, Mono: monocyte. (f-g) Purple triangle and diamond, as well as the dashed line, mark rs5827412. (f) Monocyte LRRC25 eQTL association. Downward and upward triangles indicate the direction of effect (down- and up-regulation, respectively) for variants with p<1×103. (g) ATAC-seq tracks as fold enrichment over average (range 0–40) for various blood cell types through monocyte differentiation.
Fig. 6 |
Fig. 6 |. ZC2HC1A locus: PU.1 motif-alteration highlights a regulatory variant among those in high LD.
(a-d) PU.1 motif-altering variant rs3808619 is shown as a purple diamond. Vertical dashed line also mark the position of this variant. (a) The effect of rs3808619 on the PU.1 composite motif. (b) PU.1 bQTL and lymphocyte count association signal at the ZC2HC1A locus. (c) Posterior inclusion probability (PIP) of variants in the 95% credible set of lymphocyte count association at the ZC2HC1A locus. (d) Genome tracks of PU.1 ChIP-seq, ATAC-seq, H3K4me1, H3K4me3, H3K27ac ChIP-seq assayed in GM12878. The highlighted regions correspond to molecular phenotypes with QTL associations in e. (e) The effect of rs3808619 dosage on various molecular phenotypes shown in panel d. Box plots are formatted as in Fig. 4. (f) Regulatory effects of rs3808619 and 58 tagging variants in a reporter assay. MPRA allelic effect corresponds to log2 fold change of regulatory activity of the oligo sequence with the alternate allele over that with the reference allele. The inset shows the allelic skew estimates with 95% confidence intervals from Abell et al. and Tewhey et al. *: adjusted p<0.05. (g) PU.1-dependent reduction in chromatin accessibility levels (count per million) at the regulatory element surrounding rs3808619 in control pro-B cell lines (SPI1+/+) and counterparts with SPI1 knocked out (SPI1−/−). n=3 for each condition. *: DESeq2 adjusted p<0.05. The panel is formatted as in Fig. 5d.

Similar articles

References

    1. Claussnitzer M. et al. A brief history of human disease genetics. Nature 577, 179–189 (2020). - PMC - PubMed
    1. Claussnitzer M., Dankel S. N., Kim K.-H., Hauner H. & Kellis M. FTO obesity variant circuitry and adipocyte browning in humans. New England Journal of Medicine vol. 6 895–907 (2015). - PMC - PubMed
    1. Nasser J. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243 (2021). - PMC - PubMed
    1. International Common Disease Alliance. International Common Disease Alliance White Paper v1.0. https://www.icda.bio/ (2020).
    1. Visscher P. M. et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am. J. Hum. Genet. 101, 5–22 (2017). - PMC - PubMed

Publication types