Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Jul 3:2024.05.17.24307556.
doi: 10.1101/2024.05.17.24307556.

Identifying independent causal cell types for human diseases and risk variants

Affiliations

Identifying independent causal cell types for human diseases and risk variants

Artem Kim et al. medRxiv. .

Abstract

The SNP-heritability of human diseases is extremely enriched in candidate regulatory elements (cREs) from disease-relevant cell types. Critical next steps are to understand whether these enrichments are driven by multiple causal cell types and whether individual variants impact disease risk via a single or multiple of cell types. Here, we propose CT-FM and CT-FM-SNP, 2 methods accounting for cREs shared across cell types to identify independent sets of causal cell types for a trait and its candidate causal variants, respectively. We applied CT-FM to 63 GWAS summary statistics (average N = 417K) using 924 cRE annotations, primarily from ENCODE4. CT-FM inferred 79 sets of causal cell types, with corresponding SNP-annotations explaining 39.0 ± 1.8% of trait SNP-heritability. It identified 14 traits with independent causal cell types, uncovering previously unexplored cellular mechanisms in height, schizophrenia and autoimmune diseases. We applied CT-FM-SNP to 39 UK Biobank traits and predicted high-confidence causal cell types for 3,091 candidate causal non-coding SNPs-trait pairs. Our results suggest that most SNPs affect a phenotype via a single set of cell types, whereas pleiotropic SNPs might target different cell types depending on the phenotype context. Altogether, CT-FM and CT-FM-SNP shed light on how genetic variants act collectively and individually at the cellular level to affect disease risk.

PubMed Disclaimer

Conflict of interest statement

S.G reports consulting fees from Eleven Therapeutics unrelated to the present work. The other authors declare no competing interests.

Figures

Figure 1.
Figure 1.. Overview of the CT-FM framework.
CT-FM and CT-FM-SNP are 2 methods that identify sets of independent causal cell types of a trait and its candidate causal SNPs, respectively. They take as inputs a set of cell-type-specific (CTS) candidate regulatory elements (cREs) SNP-annotations, GWAS summary statistics with a matching LD reference panel, and (for CT-FM-SNP only) a list of a trait’s candidate causal SNPs. First, CT-FM estimates the significance of the marginal effect on SNP-heritability of each CTS SNP-annotation by applying stratified LD score regression (S-LDSC),,. Then, CT-FM infers the causal cell types and outputs posterior inclusion probability (PIP) and independent causal sets (ICSs) by leveraging S-LDSC Z-scores and an adjusted-matrix of cRE linkage disequilibrium (LD) scores of CTS SNP-annotations. In our toy example, CT-FM reduces the number of SNP-annotations significantly associated with the trait to 2 ICSs (triangle and diamond), each corresponding to cell types of a distinct biological group (A and C). CT-FM-SNP leverages the same workflow as CT-FM, in the difference that it restricts the inference procedure to CTS SNP-annotations that overlap the candidate SNP. In our toy example, CT-FM-SNP infers the causal cell types of 3 GWAS candidate SNPs and assigns the first candidate SNP to cell types from the biological group A, the second candidate SNP to a cell type of the biological group C, and the third candidate SNP to cell types of the biological groups A and C. The dashed horizontal line represents the S-LDSC significance threshold.
Figure 2.
Figure 2.. Simulations to assess precision and sensitivity of methods inferring causal cell types.
We report the precision, sensitivity and F1 score in simulations with different numbers of causal SNP-annotations. Error bars represent 95% confidence intervals. Numerical results are reported in Supplementary Table 3.
Figure 3.
Figure 3.. Simulations to assess type I error, precision and sensitivity of methods inferring causal cell types of candidate SNPs.
We report the type I error (a,c), the precision, sensitivity and F1 score (b,d) in simulations in which we considered only osteoblasts (a,b) and osteoblasts and fibroblasts (c,d) as causal cell types. We report results for candidate SNPs overlapping the osteoblast SNP-annotation in (d); results for candidate SNPs overlapping the fibroblast SNP-annotation and overlapping both osteoblast and fibroblast SNP-annotations are reported in Supplementary Fig. 11. Error bars represent 95% confidence intervals. Numerical results are reported in Supplementary Table 4.
Figure 4.
Figure 4.. Benchmarking CT-FM and CT-FM-SNP on 5 blood cell traits.
(a) We report CT-FM candidate causal cell types. Dot sizes are proportional to CT-FM PIP. Numerical results are reported in Supplementary Table 5. (b) We report cell types identified by CT-FM as significantly associated with each trait. Dot sizes are proportional to S-LDSC −log10 FDR P values, and dot colors represent biological groups (red for blood/immune and green for digestive). Only CTS SNP-annotations with S-LDSC FDR P value < 0.05 are represented. Numerical results are reported in Supplementary Table 6. (c) We report the proportion of candidate causal SNPs that were linked to at least one causal cell type by CT-FM-SNP. Results for all candidate variants are reported in Supplementary Table 8. (d) We report the proportion of high-confidence {non-coding SNP, cell type, trait} triplets inferred by CT-FM-SNP for which the cell type is consistent with CT-FM results. We highlight triplets for which the causal CTS SNP-annotation was also found in CT-FM ICSs (green), triplets for which the causal CTS SNP-annotation was not found in CT-FM ICSs but corresponds to the same cell type (blue), and triplets for which the causal CTS SNP-annotation was not found in CT-FM ICSs (grey). Numerical results are reported in Supplementary Table 9. (e) We report the fraction of SNPs with a lymphocyte single-cell cis-eQTL for lymphocyte count candidate SNPs assigned to lymphocyte cell type by CT-FM-SNP (red), candidate SNPs not assigned to lymphocyte cell type (dark blue), and a set of background SNPs (light blue) (* P < 0.05; ** P < 0.01; *** P < 0.001, one-tailed). (f) Similar to (e) with monocyte single-cell cis-eQTLs and monocyte count candidate SNPs. Numerical results are reported in Supplementary Table 11. RBC: red blood cell; CMPs: common myeloid progenitor cells; HMPs: hematopoietic multipotent progenitors; NK: natural killer.
Figure 5.
Figure 5.. Application of CT-FM to 63 GWAS summary statistics.
(a) We report the number of ICSs per trait, highly-confident causal cell types (PIP > 0.5) per trait, and CTS SNP-annotations per ICS. Causal cell types with PIP < 0.5 but cPIP > 0.5 were not reported. Numerical results are reported in Supplementary Table 14. (b) CT-FM and S-LDSC results for schizophrenia. For each CTS SNP-annotation, we report S-LDSC −log10 FDR P values on the y axis, and CT-FM ICSs by different shapes (square for fetal excitatory neurons, triangle for adult glutamatergic neurons, and asterisk for immune cell types; SNP-annotations not in ICS are represented with open circle). The dashed horizontal line represents the S-LDSC FDR significance threshold. S-LDSC results for the 924 CTS SNP-annotations are reported in Supplementary Table 17. (c) We report notable candidate causal cell types inferred by CT-FM for different complex traits. Dot sizes are proportional to CT-FM PIP. CT-FM results for the 63 GWASs are reported in Supplementary Tables 14–15. (d) We report the proportion of h2 explained by each CT-FM ICSs. Dot sizes are proportional to h2 enrichment and dot colors represent biological groups (same as c). Numerical results are reported in Supplementary Table 18. (e) We report the proportion of h2-cREs explained by CT-FM ICSs for each trait. Numerical results are reported in Supplementary Table 19. Proportions of h2 and h2-cREs > 1 were rounded in (e,f) for visualization purposes; we note that values > 1 are outside the biologically plausible 0–1 range, but allowing point estimates outside the biologically plausible 0–1 range is necessary to ensure unbiasedness. ADHD: attention-deficit/hyperactivity disorder; BMI: body mass index.
Figure 6.
Figure 6.. Application of CT-FM-SNP to candidate SNPs of 39 UK Biobank traits.
(a) We report the proportion of candidate causal SNPs that were linked to at least one causal cell type by CT-FM-SNP for 12 representative traits. CT-FM-SNP results for all candidate variants of the 39 traits are reported in Supplementary Table 28. (b) We report the proportion of SNPs with CT-FM-SNP high-confidence causal cell type in different cell types for 9 representative traits. Cell types identified within CT-FM ICSs are represented by green dots and cell types not identified by CT-FM by grey dots. (c) We report the enrichment of biologically relevant processes (x axis) for genes linked to SNPs assigned to osteoblasts, epithelial cells and CD4+ T cells by CT-FM-SNP. For each process and cell type, we report the FDR-corrected enrichment p-value (y axis). Only biologically relevant gene ontology processes with FDR P < 0.01 are shown. Full gene ontology enrichment results are available in Supplementary Table 32. (d) We report CT-FM-SNP results for 107 pleiotropic SNPs identified across 18 genetically uncorrelated UK Biobank traits. The proportion of pleiotropic SNPs assigned to different candidate cell types is represented with a red bar, and the proportion of SNPs sharing at least 1 candidate causal cell type across traits is represented with a blue bar. CT-FM-SNP results for the 78 pleiotropic SNPs are reported in Supplementary Table 36. (e,f) We report 2 examples of pleiotropic SNPs assigned to the same cell type (e), and 3 examples of pleiotropic SNPs assigned to different cell types (f). MCH: mean corpuscular hemoglobin; BMI: body mass index; BP: blood pressure; VSMCs: vascular smooth muscle cells; BMSC: bone marrow stromal cells; HLSRC: high light scatter reticulocyte count; HbA1c: hemoglobin A1c; MPV: mean platelet volume. MP: mononuclear phagocytes.

References

    1. Claussnitzer M. et al. A brief history of human disease genetics. Nature 577, 179–189 (2020). - PMC - PubMed
    1. Umans B. D., Battle A. & Gilad Y. Where Are the Disease-Associated eQTLs? Trends Genet. 37, 109–124 (2021). - PMC - PubMed
    1. Hekselman I. & Yeger-Lotem E. Mechanisms of tissue and cell-type specificity in heritable traits and diseases. Nat. Rev. Genet. 21, 137–150 (2020). - PubMed
    1. Finucane H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015). - PMC - PubMed
    1. Jagadeesh K. A. et al. Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics. Nat. Genet. 54, 1479–1492 (2022). - PMC - PubMed

Publication types