Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 5:15:1454263.
doi: 10.3389/fimmu.2024.1454263. eCollection 2024.

Evaluating methods for integrating single-cell data and genetics to understand inflammatory disease complexity

Affiliations

Evaluating methods for integrating single-cell data and genetics to understand inflammatory disease complexity

Hope A Townsend et al. Front Immunol. .

Abstract

Background: Understanding genetic underpinnings of immune-mediated inflammatory diseases is crucial to improve treatments. Single-cell RNA sequencing (scRNA-seq) identifies cell states expanded in disease, but often overlooks genetic causality due to cost and small genotyping cohorts. Conversely, large genome-wide association studies (GWAS) are commonly accessible.

Methods: We present a 3-step robust benchmarking analysis of integrating GWAS and scRNA-seq to identify genetically relevant cell states and genes in inflammatory diseases. First, we applied and compared the results of three recent algorithms, based on pathways (scGWAS), single-cell disease scores (scDRS), or both (scPagwas), according to accuracy/sensitivity and interpretability. While previous studies focused on coarse cell types, we used disease-specific, fine-grained single-cell atlases (183,742 and 228,211 cells) and GWAS data (Ns of 97,173 and 45,975) for rheumatoid arthritis (RA) and ulcerative colitis (UC). Second, given the lack of scRNA-seq for many diseases with GWAS, we further tested the tools' resolution limits by differentiating between similar diseases with only one fine-grained scRNA-seq atlas. Lastly, we provide a novel evaluation of noncoding SNP incorporation methods by testing which enabled the highest sensitivity/accuracy of known cell-state calls.

Results: We first found that single-cell based tools scDRS and scPagwas called superior numbers of supported cell states that were overlooked by scGWAS. While scGWAS and scPagwas were advantageous for gene exploration, scDRS effectively accounted for batch effect and captured cellular heterogeneity of disease-relevance without single-cell genotyping. For noncoding SNP integration, we found a key trade-off between statistical power and confidence with positional (e.g. MAGMA) and non-positional approaches (e.g. chromatin-interaction, eQTL). Even when directly incorporating noncoding SNPs through 5' scRNA-seq measures of regulatory elements, non disease-specific atlases gave misleading results by not containing disease-tissue specific transcriptomic patterns. Despite this criticality of tissue-specific scRNA-seq, we showed that scDRS enabled deconvolution of two similar diseases with a single fine-grained scRNA-seq atlas and separate GWAS. Indeed, we identified supported and novel genetic-phenotype linkages separating RA and ankylosing spondylitis, and UC and crohn's disease. Overall, while noting evolving single-cell technologies, our study provides key findings for integrating expanding fine-grained scRNA-seq, GWAS, and noncoding SNP resources to unravel the complexities of inflammatory diseases.

Keywords: GWAS; SNP-gene linking; autoimmune diseases; benchmarking; omics; scRNA-seq.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Overview of study design. (A) We first benchmarked the three most recent tools built to identify cell states and genes associated to disease according to both genetics (GWAS) and transcriptomics (scRNA-seq). (B) We next assessed if a single scRNA-seq atlas could be used with summary statistics from two diseases to reveal well separated disease associated cell states of the different diseases. (C) Finally, we assessed the robustness and accuracy of results of these tools when using different SNP-Gene linking methods. Figure made in Biorender.
Figure 2
Figure 2
Comparison of cell-state-specific significance results for RA and UC. For each cell-type and cell-state, the single-cell level scDRS Z-scores and scPagwas TRS Z-scores are displayed in boxplots colored according to the group scDRS Z-score or group scPagwas bootstrap Z-score. Non-significant cell states in scDRS or scPagwas are shown unbolded with grey outliers, while significant cell states are bolded. scGWAS called gene modules and their disease scores are also plotted with colors following the scDRS group Z-score gradient for easier comparison. Cell states considered significant by all three tools are bolded. “General literary support” means the general cell type has been shown to associate with the disease while “specific” denotes evidence in the literature linking the specific cell state. Left: RA (rheumatoid arthritis). Right: UC (ulcerative colitis).
Figure 3
Figure 3
Gene comparisons show low correlation across tool-based genes and single-cell disease scores. (A) UpSet plots of the top 1000 ranked genes for scDRS (highest correlation to scDRS disease scores), scPagwas (highest correlation to genetically associated pathway activity scores) and MAGMA as well as the significant scGWAS genes. RA=Rheumatoid arthritis, UC=Ulcerative colitis. (B) Scatter plots of the correlations of all studied genes with scDRS disease scores and scPagwas gPAS with (top) scGWAS genes, (middle) MAGMA genes, or (bottom) ribosomal genes highlighted. Genes reaching the top 1000 ranked genes for scPagwas and scDRS are colored in light and dark turquoise, respectively. (C) scGWAS results when using a pathway file based on Pathway Commons v12 or 14. Results are highlighted according to the number of significant gene modules called per RA cell state and max disease Z score across the modules for each cell state. Only cell states with a significant gene module from using either pathway file are shown. Cell states without a significant gene module called when only one of the pathway files was used are bolded.
Figure 4
Figure 4
Comparison of similar diseases with scDRS. Summary statistics unique to each disease were used on the same scRNA-seq data for each pair (14, 23). scDRS defines significant clusters (annotated according to original papers) with a group disease Z-score as shown in the gradient legend. Cell clusters with literary support for either disease are labeled in purple/orange for RA/UC and green/blue for AS/CD, respectively. General literary support means that a cell type with multiple cell states is supported by the literature while specific means a specific single cell state was supported. (A) Rheumatoid arthritis (RA) vs Ankylosing Spondylitis (AS). (B) Ulcerative Colitis (UC) vs Crohn’s Disease (CD).
Figure 5
Figure 5
scDRS results for RA of clusters that show different levels of significance with different MAGMA windows being used to generate the GWAS inputs (0-0kb, 5-5kb, 10-10kb, 50-35kb, 100-100kb). scDRS defines significant clusters with a group disease Z-score as shown in the gradient legend (significant scores marked with square). Cell states with significant heterogeneity scores are marked by an X. General literary support means that a cell type with multiple cell states is supported by the literature while specific means a specific single cell state was supported. Cell states with changes in just scDRS disease score, heterogeneity score, or both significance calls across MAGMA windows are marked in bold and with grey or turquoise squares.

References

    1. Morgan C, Lunt M, Brightwell H, Bradburn P, Fallow W, Lay M, et al. . Contribution of patient related differences to multidrug resistance in rheumatoid arthritis. Ann Rheumatic Dis. (2003) 62:15–9. doi: 10.1136/ard.62.1.15 - DOI - PMC - PubMed
    1. Method of the year 2019: single-cell multimodal omics. Nat Methods. (2020) 17:1–15. doi: 10.1038/s41592-019-0703-5 - DOI - PubMed
    1. Schaid DJ, Chen W, Larson NB. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat Rev Genet. (2018) 19:491–5045. doi: 10.1038/s41576-018-0016-z - DOI - PMC - PubMed
    1. Zhang MJ, Hou K, Dey KK, Sakaue S, Jagadeesh KA, Weinand K, et al. . Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data. Nat Genet. (2022) 54:1572–80. doi: 10.1038/s41588-022-01167-z - DOI - PMC - PubMed
    1. Jia P, Hu R, Yan F, Dai Y, Zhao Z. scGWAS: landscape of trait-cell type associations by integrating single-cell transcriptomics-wide and genome-wide association studies. Genome Biol. (2022) 23:2205. doi: 10.1186/s13059-022-02785-w - DOI - PMC - PubMed

MeSH terms

LinkOut - more resources