Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Nov 15:2023.11.13.566919.
doi: 10.1101/2023.11.13.566919.

Identifying genetic variants that influence the abundance of cell states in single-cell data

Affiliations

Identifying genetic variants that influence the abundance of cell states in single-cell data

Laurie Rumker et al. bioRxiv. .

Update in

No abstract available

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. Method schematic.
(A) Consider a variant associated with disease risk. Intermediate traits that may also associate with the variant and even mediate the genetic risk include well-studied molecular traits (e.g., transcript or protein abundance) as well as changes in the abundance of cells with varying character and function, illustrated here by variation in shape and color. Previous genetic studies of cell state abundance traits have quantified target cell states (e.g., triangle cell type) using flow cytometry. High-dimensional profiling may reveal genetically associated variation in the abundance of cell states researchers may not anticipate or cannot flow-sort (e.g., green character within the circle cell type). Detection of such associations requires granular information about cell variation and a flexible method for detecting variant-associated cell states. (B) For a given single-cell dataset, we use the landscape of cells observed for each sample to compute a granular distribution of fractional cellular abundance across the total cell state space, conceptually illustrated here in two dimensions. We illustrate a single axis of compositional variation across the four samples shown. (C) The Neighborhood Abundance Matrix stores the fractional abundance of cells from each sample in each neighborhood. Sample H is highlighted as an example. Principal components analysis of this object yields sample and neighborhood loading information on NAM-PCs. We illustrate with NAM-PC1 how these loadings would reflect the compositional axis illustrated in (B). We show another possible component (NAM-PCk) to illustrate that NAM-PCs can capture co-variation across transcriptionally distant regions. (D) GeNA uses a test statistic Y, which follows a chi-squared distribution with k degrees of freedom, to detect an association between allele dose for a given SNP and any systematic change in tissue cellular composition captured by NAM-PC1-k (Methods). (E) We illustrate how the cell state abundance shift shown in (A) might be revealed using GeNA.
Figure 2:
Figure 2:. csaQTLs detected in the OneK1K dataset.
(A) Superimposed Manhattan plots for csaQTL GWASs among NK cell states and among myeloid cell states. A genome-wide significance threshold of p<5×10−8 is indicated by a dashed line. SNPs with p-values less than this threshold are colored according to their source GWAS: NK cells (green) or myeloid (orange). Another dashed line indicates a p<1×10−6 threshold for suggestive associations. (B-F) Cell abundance correlation per neighborhood to dose of alternative allele is shown in UMAPs for each of the five genome-wide significant loci: (B) NK csaQTL 2q13, (C) NK csaQTL 11q24.3, (D) NK csaQTL 12p13.2, (E) myeloid csaQTL 15q25.1, and (F) NK csaQTL 19p13.11. (G-H) Cell type cluster labels from Yazar*, Alquicira-Hernandez*, Wing*, et al. for (G) NK and (H) myeloid cells are shown for reference.
Figure 3:
Figure 3:. Characterization of the csaQTL at 12p13.2.
(A-B) GeNA output: (A) Boxplot of sample-level phenotype values for each individual organized by genotype at the lead SNP. We also show the GeNA p-value. (B) UMAP of NK cells colored by neighborhood-level phenotype value (i.e., correlation between cell abundance and dose of alternative allele per neighborhood). (C) Heatmap of expression across neighborhoods for genes with strong correlations in expression to csaQTL neighborhood-level phenotype. Neighborhoods are arrayed along the x-axis by phenotype value. The phenotype-correlated genes include general markers of NK activation (CD69, NFKBIA) as well as TNF-α (DUSP2, ZFP36, JUNB, IER2) and IFN-γ (CD74, XCL1) response genes. (D-E) Gene set enrichment analysis identified significant activation of TNF-α and IFN-γ response pathways in association with the csaQTL phenotype. We show UMAPs of NK cells colored by summed expression of (D) TNF-α response genes and (E) IFN-γ response genes. We report the Pearson’s r across neighborhoods between phenotype values and summed expression within the gene set. We also show the FDR-adjusted gene set enrichment p-value. (F) Locus zoom plot with one marker per tested SNP, genomic position along the x-axis, and GeNA p-value on the y-axis. Each SNP marker is colored by linkage disequilibrium (LD) value relative to the lead SNP. The csaQTL lead SNP is labeled with a green diamond. The psoriasis risk and KLRC1 eQTL lead SNPs are labeled with purple triangles. (G) Diagram of genotypes for the csaQTL lead SNP and colocalizing associations to molecular, tissue and organism-level traits at this locus.
Figure 4:
Figure 4:. Polygenic risk scores aggregate the effects of individual loci to highlight disease-relevant cell states, a valuable point of comparison to single-cell case-control analyses.
(A) Histogram of SNPs in the SLE PRS. For each SNP, we plot the Pearson’s r correlation across OneK1K myeloid neighborhoods between the SLE risk-associated phenotype and an IFN-α response gene signature. The marker for each risk allele is colored according to its effect weight in the PRS. Six SNPs plotted in (B) are highlighted in orange. (B) We show six SNPs in the SLE PRS for which the myeloid cell state abundance correlations to the SLE risk allele correspond closely to an IFN-α response signature. For each selected SNP, we plot a UMAP of OneK1K myeloid cells colored by the abundance correlation per neighborhood to dose of the risk allele. We also report the gene(s) to which the SNP has been mapped, the Pearson’s r correlation between the neighborhood-level phenotype and IFN-α response signature, and the FDR-adjusted CNA global p-value. (C) Myeloid cell state abundance shift associated with increasing SLE PRS value in the OneK1K cohort. CNA global p values are shown with (pIFN) and without (p) controlling for mean IFN-α response gene expression per individual. (D) IFN-α response gene expression per neighborhood among myeloid cells in the OneK1K cohort. Pearson’s r between IFN-α response per neighborhood and the PRS phenotype from (C) is shown, with associated bootstrapped p-value for r>0. (E) Myeloid cell state abundance shift associated with SLE disease status in the Perez et al. European cohort. CNA global p-values are shown with (pIFN) and without (p) controlling for mean IFN response gene expression per individual. (F) IFN response gene expression per neighborhood among myeloid cells in the Perez et al. European cohort. Pearson’s r between IFN response per neighborhood and SLE phenotype from (E) is shown, with associated bootstrapped p-value for r>0.

Similar articles

References

    1. Yazar S. et al. Single-cell eQTL mapping identifies cell type–specific genetic control of autoimmune disease. Science 376, eabf3041 (2022). - PubMed
    1. Visscher P. M. et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am. J. Hum. Genet. 101, 5–22 (2017). - PMC - PubMed
    1. Welter D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014). - PMC - PubMed
    1. Shendure J., Findlay G. M. & Snyder M. W. Genomic Medicine–Progress, Pitfalls, and Promise. Cell 177, 45–57 (2019). - PMC - PubMed
    1. Wang Q. S. et al. Leveraging supervised learning for functionally informed fine-mapping of cis-eQTLs identifies an additional 20,913 putative causal eQTLs. Nat. Commun. 12, 3394 (2021). - PMC - PubMed

Publication types