Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Jun 17:2023.11.01.23297909.
doi: 10.1101/2023.11.01.23297909.

Fine-mapping causal tissues and genes at disease-associated loci

Affiliations

Fine-mapping causal tissues and genes at disease-associated loci

Benjamin J Strober et al. medRxiv. .

Update in

Abstract

Heritable diseases often manifest in a highly tissue-specific manner, with different disease loci mediated by genes in distinct tissues or cell types. We propose Tissue-Gene Fine-Mapping (TGFM), a fine-mapping method that infers the posterior probability (PIP) for each gene-tissue pair to mediate a disease locus by analyzing GWAS summary statistics (and in-sample LD) and leveraging eQTL data from diverse tissues to build cis-predicted expression models; TGFM also assigns PIPs to causal variants that are not mediated by gene expression in assayed genes and tissues. TGFM accounts for both co-regulation across genes and tissues and LD between SNPs (generalizing existing fine-mapping methods), and incorporates genome-wide estimates of each tissue's contribution to disease as tissue-level priors. TGFM was well-calibrated and moderately well-powered in simulations; unlike previous methods, TGFM was able to attain correct calibration by modeling uncertainty in cis-predicted expression models. We applied TGFM to 45 UK Biobank diseases/traits (average N = 316K) using eQTL data from 38 GTEx tissues. TGFM identified an average of 147 PIP > 0.5 causal genetic elements per disease/trait, of which 11% were gene-tissue pairs. Implicated gene-tissue pairs were concentrated in known disease-critical tissues, and causal genes were strongly enriched in disease-relevant gene sets. Causal gene-tissue pairs identified by TGFM recapitulated known biology (e.g., TPO-thyroid for Hypothyroidism), but also included biologically plausible novel findings (e.g., SLC20A2-artery aorta for Diastolic blood pressure). Further application of TGFM to single-cell eQTL data from 9 cell types in peripheral blood mononuclear cells (PBMC), analyzed jointly with GTEx tissues, identified 30 additional causal gene-PBMC cell type pairs at PIP > 0.5-primarily for autoimmune disease and blood cell traits, including the biologically plausible example of CD52 in classical monocyte cells for Monocyte count. In conclusion, TGFM is a robust and powerful method for fine-mapping causal tissues and genes at disease-associated loci.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. Calibration and power of tissue-gene fine-mapping methods in simulations.
(a,b) Average gene-tissue pair fine-mapping FDR across 100 simulations for various fine-mapping methods (see legend) across eQTL sample sizes (x-axis) at PIP=0.5 (a) and PIP=0.9 (b). Dashed horizontal line denotes 1 – PIP threshold (see main text). Numerical results are reported in Supplementary Table 1. (c,d) Average gene-tissue pair fine-mapping power across 100 simulations for various fine-mapping methods (see legend) across eQTL sample sizes (x-axis) at PIP=0.5 (c) and PIP=0.9 (d). Error bars denote 95% confidence intervals. Numerical results are reported in Supplementary Table 2.
Figure 2:
Figure 2:. Calibration and power of fine-mapping different classes of genetic elements with TGFM in simulations.
(a,b) Average fine-mapping FDR across 100 simulations using TGFM for different classes of genetic elements (see legend) across eQTL sample sizes (x-axis) at PIP=0.5 (a) and PIP=0.9 (b). Dashed horizontal line denotes 1 – PIP threshold (see main text). Numerical results are reported in Supplementary Table 3. (c,d) Average fine-mapping power across 100 simulations using TGFM for different classes of genetic elements (see legend) across eQTL sample sizes (x-axis) at PIP=0.5 (c) and PIP=0.9 (d). Error bars denote 95% confidence intervals. Numerical results are reported in Supplementary Table 4.
Figure 3:
Figure 3:. Summary results of fine-mapping genetic elements with TGFM for 16 independent UK Biobank diseases and traits.
We report the number of (a) Gene-tissue pairs, (b) Genes, and (c) (non-mediated) Variants fine-mapped using TGFM (y-axis; square root scale) across 16 independent UK Biobank traits (x-axis) at various PIP thresholds ranging from 0.2 to 1.0 (color-bars). Horizontal black lines denote the number of genetic elements fine-mapped at PIP=0.5. FEV1:FVC, ratio of forced expiratory volume in 1 second to forced vital capacity; Platelet volume, Mean platelet volume; Diastolic BP, Diastolic blood pressure; Reticulocyte count, High-light scatter reticulocyte count; Corp. hemoglobin, Mean corpuscular hemoglobin; FVC, Forced vital capacity. Results for all 45 UK Biobank diseases and traits are reported in Supplementary Figure 30, and numerical results are reported in Supplementary Table 8.
Figure 4:
Figure 4:. Properties of fine-mapped tissues and genes.
(a) Proportion of fine-mapped gene-tissue pairs in each tissue (x-axis) for 14 representative traits (y-axis). Proportions for each trait were calculated by counting the number of gene-tissue pairs with TGFM PIP > 0.5 in each tissue and normalizing the counts across tissues. Tissues are only displayed if their proportion is > 0.2 for at least one of the 14 representative traits. Asterisks denote statistical significance (FDR ≤ 0.05 via the TGFM tissue-specific prior) of each tissue-trait pair. Results for all remaining traits and tissues are reported in Supplementary Figure 31, and numerical results are reported in Supplementary Table 9. The 14 representative traits were selected by including 12 of the 16 independent traits (Figure 3) with many high PIP gene-tissue pairs and two additional, interesting traits (All autoimmune and Vitamin D level). (b) Proportion of stratified tissue-trait pairs reported as statistically significant in S-LDSC analyses using chromatin data (y-axis) as a function of S-LDSC significance thresholds (x-axis), across all 45 traits analyzed; tissue-trait pairs are stratified according to significance (FDR ≤ 0.05 or FDR > 0.05) via the TGFM tissue-specific prior. Results at alternative TGFM tissue-specific prior significance thresholds are reported in Supplementary Figure 33, and numerical results are reported in Supplementary Table 11. (c) Average PoPS score (y-axis) of genes stratified by TGFM (Gene) PIP (x-axis). Averages were computed across genes for the 16 independent traits listed in Figure 3, as both PoPS score and TGFM gene PIPs are trait-specific. Error bars denote 95% confidence intervals. Numerical results are reported in Supplementary Table 13. (d) Empirical FDR when distinguishing a silver-standard gene set of 69 known LDL cholesterol genes analyzed in Figure 4 of ref. from nearby genes (y-axis), for TGFM (Gene) PIP for LDL cholesterol greater than or equal to a range of PIP thresholds (x-axis). Light green shading denotes 95% confidence intervals. Numerical results are reported in Supplementary Table 15.
Figure 5:
Figure 5:. Robustness of TGFM results in analyses of alternative eQTL data sets.
(a) Results of tissue-ablation analysis of 115 gene-tissue pairs for 18 representative traits that were prioritized by TGFM (PIP > 0.5) in the primary analysis. We report the number of loci in the tissue-ablation analysis with no gene-tissue pair prioritized by TGFM (PIP > 0.5); a gene-tissue pair prioritized by TGFM (PIP > 0.5) corresponding to the same gene and the best proxy tissue (see Methods); a gene-tissue pair prioritized by TGFM (PIP > 0.5) corresponding to the same gene and a non-proxy tissue; or a gene-tissue pair prioritized by TGFM (PIP > 0.5) corresponding to a different gene. Results at alternative PIP thresholds are reported in Supplementary Figure 38, and numerical results are reported in Supplementary Table 16. (b) Results of replacing GTEx whole blood (N=320) with pseudobulk PBMC (N=113) for 62 gene-trait pairs for 18 representative traits that TGFM fine-mapped for GTEx whole blood (PIP > 0.5) in the primary analysis. The vertical red line denotes the average PIP in PBMC, and the histogram summarizes the average PIP of each GTEx tissue (excluding whole blood). Numerical results are reported in Supplementary Table 17. The 18 representative traits consist of the 16 independent traits (Figure 3) and two additional, interesting traits (All autoimmune and Vitamin D levels).
Figure 6:
Figure 6:. Examples of fine-mapped gene-tissue-disease triplets identified by TGFM.
We report 6 example loci for which TGFM fine-mapped a gene-tissue pair (PIP > 0.5). In each example we report the marginal GWAS and TWAS association −log10 p-values (y-axis) of non-mediated variants (blue circles) and gene-tissue pairs (red triangles). Marginal TWAS association −log10 p-values were calculated by taking the median −log10 TWAS p-value across the 100 sets of sampled cis-predicted expression models for each gene-tissue pair. The genomic position of each gene-tissue pair (x-axis) was based on the gene’s TSS. The color shading of each variant and gene-tissue pair was determined by its TGFM PIP. Any genetic element with TGFM PIP > 0.5 was made larger in size. Dashed horizontal blue and red lines represent GWAS significance (5 × 10−8) and TWAS significance (4.2 × 10−7) thresholds, respectively. Numerical results are reported in Supplementary Table 18.
Figure 7:
Figure 7:. Summary results of fine-mapping gene-PBMC cell type pairs with TGFM for 18 representative UK Biobank diseases and traits.
(a-b) Number of gene-PBMC cell type pairs fine-mapped using TGFM (y-axis; square root scale) across 18 representative UK Biobank traits (x-axis) at various PIP thresholds ranging from 0.2 to 1.0 (color-bar), distinguishing between (a) autoimmune diseases and blood cell traits and (b) non-blood-related traits. Horizontal black lines denote the number of gene-PBMC cell type pairs fine-mapped at PIP=0.5. The 18 representative traits (same as Figure 5) consist of the 16 independent traits (Figure 3) and two additional, interesting traits (All autoimmune and Vitamin D levels). Results for all 45 UK Biobank diseases and traits are reported in Supplementary Figure 42. (c-e) Number of gene-PBMC cell type pairs fine-mapped using TGFM (y-axis; square root scale) in each of the 9 PBMC cell types (x-axis) at various PIP thresholds ranging from 0.2 to 1.0 (color-bar) for (c) Monocyte count, (d) Lymphocyte count, and (e) All autoimmune disease. Horizontal black lines denote the number of gene-PBMC cell type pairs fine-mapped at PIP=0.5. Asterisks denote statistical significance (FDR ≤ 0.05 via the TGFM tissue-specific prior) of each PBMC cell type-trait pair. Results for all 45 UK Biobank diseases and traits are reported in Supplementary Figure 43. Numerical results are reported in Supplementary Table 20.
Figure 8:
Figure 8:. Examples of fine-mapped gene-PBMC cell type-disease triplets identified by TGFM.
We report 4 example loci for which TGFM fine-maps a gene-PBMC cell type pair (PIP > 0.5). In each example we report the marginal GWAS and TWAS association −log10 p-values (y-axis) of non-mediated variants (blue circles) and gene-tissue (or gene-PBMC cell type) pairs (red triangles). Marginal TWAS association −log10 p-values were calculated by taking the median −log10 TWAS p-value across the 100 sets of sampled cis-predicted expression models for each gene-tissue (or gene-PBMC cell type) pair. The genomic position of each gene-tissue (or gene-PBMC cell type) pair (x-axis) was based on the gene’s TSS. The color shading of each variant and gene-tissue (or gene-PBMC cell type) pair was determined by its TGFM PIP. Any genetic element with TGFM PIP > 0.5 was made larger in size. Dashed horizontal blue and red lines represent GWAS significance (5 × 10−8) and TWAS significance (4.1 × 10−7) thresholds, respectively. Numerical results are reported in Supplementary Table 22.

References

    1. Hekselman I. & Yeger-Lotem E. Mechanisms of tissue and cell-type specificity in heritable traits and diseases. Nat. Rev. Genet. 21, 137–150 (2020). - PubMed
    1. Trynka G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 45, 124–130 (2013). - PMC - PubMed
    1. Finucane H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015). - PMC - PubMed
    1. Kundaje A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015). - PMC - PubMed
    1. Ongen H. et al. Estimating the causal tissues for complex traits and diseases. Nat. Genet. 49, 1676–1683 (2017). - PubMed

Publication types