Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Mar 20:2023.03.14.23287257.
doi: 10.1101/2023.03.14.23287257.

Mapping the dynamic genetic regulatory architecture of HLA genes at single-cell resolution

Affiliations

Mapping the dynamic genetic regulatory architecture of HLA genes at single-cell resolution

Joyce B Kang et al. medRxiv. .

Update in

  • Mapping the dynamic genetic regulatory architecture of HLA genes at single-cell resolution.
    Kang JB, Shen AZ, Gurajala S, Nathan A, Rumker L, Aguiar VRC, Valencia C, Lagattuta KA, Zhang F, Jonsson AH, Yazar S, Alquicira-Hernandez J, Khalili H, Ananthakrishnan AN, Jagadeesh K, Dey K; Accelerating Medicines Partnership Program: Rheumatoid Arthritis and Systemic Lupus Erythematosus (AMP RA/SLE) Network; Daly MJ, Xavier RJ, Donlin LT, Anolik JH, Powell JE, Rao DA, Brenner MB, Gutierrez-Arcelus M, Luo Y, Sakaue S, Raychaudhuri S. Kang JB, et al. Nat Genet. 2023 Dec;55(12):2255-2268. doi: 10.1038/s41588-023-01586-6. Epub 2023 Nov 30. Nat Genet. 2023. PMID: 38036787 Free PMC article.

Abstract

The human leukocyte antigen (HLA) locus plays a critical role in complex traits spanning autoimmune and infectious diseases, transplantation, and cancer. While coding variation in HLA genes has been extensively documented, regulatory genetic variation modulating HLA expression levels has not been comprehensively investigated. Here, we mapped expression quantitative trait loci (eQTLs) for classical HLA genes across 1,073 individuals and 1,131,414 single cells from three tissues, using personalized reference genomes to mitigate technical confounding. We identified cell-type-specific cis-eQTLs for every classical HLA gene. Modeling eQTLs at single-cell resolution revealed that many eQTL effects are dynamic across cell states even within a cell type. HLA-DQ genes exhibit particularly cell-state-dependent effects within myeloid, B, and T cells. Dynamic HLA regulation may underlie important interindividual variability in immune responses.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. Overview of study and scHLApers pipeline.
(A) We used four datasets with genotype and scRNA-seq data: Synovium (n=69 individuals, 275,323 cells), Intestine (n=22, 137,321), PBMC-cultured (n=73, 188,507), and PBMC-blood (n=909, 765,079). (B) Using the genotype data, we imputed SNPs within the MHC and one- and two-field classical HLA alleles. (C) Schematic of scHLApers pipeline, where scRNA-seq reads are aligned to a personalized reference for each individual based on classical HLA alleles. In the example, an individual is heterozygous for all eight HLA genes, so 16 additional contigs are added to the reference. Original reference gene sequences are masked with Ns. scHLApers outputs a whole-transcriptome counts matrix with improved HLA gene estimates. Both alleles contribute to count estimation for each gene. (D) We generated an atlas of HLA expression across all cell types (top) and mapped eQTLs for HLA genes in myeloid, B, and T cells (bottom). Schematic of example dynamic eQTL (bottom), where eQTL strength (slope, βtotal) changes across T cell states. Abbreviations: PBMCs, peripheral blood mononuclear cells; MHC, major histocompatibility complex; WGS, whole-genome sequencing. Icon of scRNA-seq from BioRender.
Fig. 2.
Fig. 2.. Quantifying single-cell HLA expression using a personalized pipeline.
(A) Frequency of each imputed two-field HLA allele across all cohorts. Most common alleles (up to ten) are labeled for each gene, with other alleles grouped into “other.” Alleles with frequency of less than five are not labeled. (B) Boxplot (each observation is one sample) showing percentage change (y-axis) in the estimated UMIs for each HLA gene (x-axis) summed across all cells after quantification with scHLApers (compared to a pipeline using the standard reference genome) in the Synovium dataset (n=69 individuals). The center line of the boxplot represents the median, the lower and upper box limits represent the 25% and 75% quantiles, respectively, the whiskers extend to the box limit ±1.5 × IQR, and outlying points are plotted individually. Plot for all cohorts is in Fig. S4C. Dotted red line denotes no change. (C) Percentage change in estimated expression (y-axis) for three example genes (HLA-DQA1, HLA-DQB1, and HLA-DRB1) per sample as a function of the mean (between the individual’s two alleles) Levenshtein distance relative to the reference allele at the 3’ end (x-axis). Dotted red line denotes no change. Plot for all genes is in Fig. S4D. (D) Comparing the estimated HLA expression as measured using shorter reads (84 bp, y-axis) versus longer reads (289 bp, x-axis) in the standard pipeline (top row) compared to scHLApers (bottom row). Each dot shows the mean log(CP10k+1)-normalized expression across cells for one sample in the PBMC-cultured dataset (n=146 samples from 73 individuals). r is calculated as Pearson correlation; dashed gray line is the identity line. (E) HLA expression in different cell types across cohorts: myeloid (n=145,090 cells), B (n=180,935), T (n=805,389), NK (n=125,865), fibroblasts (n=82,651) and endothelial (n=26,300). Dot size indicates the proportion of cells with nonzero expression; color indicates log(CP10k+1)-normalized expression (mean across cells).
Fig. 3.
Fig. 3.. eQTLs for classical HLA genes from pseudobulk analysis.
(A) Manhattan plot showing the significance (y-axis) of association between tested MHC variants (x-axis) and expression of each HLA gene (color) in myeloid cells from the multi-cohort model. Most significant (lead) eQTLs are labeled. Diamond points indicate TSS of each gene. (B) Boxplot showing an example lead eQTL (rs3104413). Increased dosage of the G allele (x-axis) associates with higher HLA-DQA1 expression in myeloid cells (y-axis: units are the residual of inverse normal transformed mean log(CP10k+1)-normalized expression across cells after regressing out covariates), plotted by dataset (color). All lead eQTLs shown in Figs. S8-10. (C) Locus zoom plot for the primary (rs3104413) and secondary (rs9272294) eQTLs for HLA-DQA1 in myeloid cells. Significance of association (y-axis) is shown for nearby variants on chromosome 6 (x-axis); color denotes LD (r2 with lead eQTL in multi-ancestry HLA reference). Triangles point upwards for a positive (downwards for negative) effect on expression. Gene bodies and direction of transcription (arrows) for HLA-DRB1, HLA-DQA1, and HLA-DQB1 are underneath. (D) Grid showing lead eQTLs for each HLA gene (columns) in each cell type (rows: myeloid, B, and T). Each element of the grid includes a forest plot with the estimated lead effect size (x-axis) and 95% confidence interval (±1.96 s.e.) of the estimate from the multi-cohort analysis (represented by a diamond) and the same variant-gene pair tested for an association within each cohort separately (represented by dots above). The size of the dots/diamond indicates cohort size; color indicates sign of the ALT allele’s effect on expression (blue indicates positive, red indicates negative). The eQTLs boxed in blue and magenta are highlighted in (B, C) and (E), respectively. (E) Example of a cell-type-dependent eQTL (rs9271117) that was the lead eQTL for HLA-DRB1 and strongest in B cells. Boxplots are formatted analogously to panel (B) and show the eQTL variant’s effect for all three cell types separately.
Fig. 4.
Fig. 4.. Identifying dynamic eQTLs by modeling single-cells in shared cell state embedding.
(A-C) UMAP of cells generated using tissue-defined embedding (top 10 hPCs from Synovium and Intestine), with PBMC datasets projected into the same space. The plot is divided into three sections: (A) myeloid cells, (B) B, and (C) T cells. Left: class I and II HLA expression across cells across datasets. Cells are binned into hexagons to avoid overplotting (50 bins per horizontal and vertical UMAP directions) and colored by mean log(CP10k+1)-normalized expression of class I/II genes per bin (e.g., for class I, mean of HLA-A, -B, and -C). Right: cell state annotations (color) for a representative PBMC (PBMC-blood) and solid tissue (Synovium) dataset from merging annotations from each dataset to a shared set of labels. (D) Estimated proportion of variance in UMI counts explained by cell state hPCs (color) across HLA genes and cell types. (E) NBME model of single cells used to identify cell-state-dependent regulatory effects. Pink box highlights terms for cell state (10 hPCs per cell type) and their interaction with genotype. (F) Testing lead eQTLs identified in multi-cohort pseudobulk analysis for cell-state dependence using the NBME model in each dataset (color) in myeloid (“M”), B, and T cells. Magnitude of genotype main effect (x-axis) versus the significance of cell-state interaction (y-axis), measured using Chi-Square (χ2) statistic from LRT comparing full model (E) to null model without GxhPC interaction terms. Abbreviations: UMAP, uniform manifold approximation and projection; NBME, negative binomial mixed effects; LRT, likelihood ratio test; hPC, Harmonized PC.
Fig. 5.
Fig. 5.. Dynamic HLA regulatory effects.
(A) Dynamic HLA-A eQTL (rs7747253) in T cells (n=82,423 cells in Synovium, n=538,579 in PBMC-blood). UMAP shows PBMC-blood T cells colored by estimated eQTL strength (βtotal). Boxplots for the eQTL effect are shown for two annotated cell states (CD8+ cytotoxic and proliferating, outlined in red circles), showing mean log2(UMI+1) of HLA-A across all cells in the cell state per individual by genotype. βNBME and P-value values are derived from fitting the NBME model without cell state interaction terms on the discrete cell populations and comparing to a null model without genotype using an LRT (n=908 individuals, m=96,516 cells for CD8+ cytotoxic; n=409, m=739 for proliferating). (B-D) Dynamic HLA-DQA1 eQTL (rs3104371) in T cells. (B) UMAP colored by eQTL strength (βtotal), from blue (weakest) to orange (strongest). (C) Boxplots showing the eQTL effects in cells from the top and bottom quintile of βtotal, showing mean log2(UMI+1) per individual (y-axis) by genotype. Labeled βNBME and P-value are derived from fitting the NBME model without cell state interaction terms on the cells from the discrete quintile and comparing to a null model without genotype using an LRT. (D) Scatter plot showing the mean βtotal (y-axis) compared to the mean log(CP10k+1)-normalized expression of HLA-DQA1 (x-axis) across annotated cell states (color). (E-G) is analogous to (B-D), showing dynamic HLA-DQA1 eQTL (rs3104413) in myeloid cells (n=66,789 cells in Synovium, 40,568 in PBMC-blood). For all boxplots, center line represents median; lower and upper box limits represent the 25% and 75% quantiles, respectively; whiskers extend to box limit ±1.5 × IQR; points are plotted individually.

References

    1. Lenz T. L., Spirin V., Jordan D. M., Sunyaev S. R., Excess of Deleterious Mutations around HLA Genes Reveals Evolutionary Cost of Balancing Selection. Mol. Biol. Evol. 33, 2555–2564 (2016). - PMC - PubMed
    1. Dendrou C. A., Petersen J., Rossjohn J., Fugger L., HLA variation and disease. Nat. Rev. Immunol. 18, 325–339 (2018). - PubMed
    1. Matzaraki V., Kumar V., Wijmenga C., Zhernakova A., The MHC locus and genetic susceptibility to autoimmune and infectious diseases. Genome Biol. 18, 76 (2017). - PMC - PubMed
    1. Trowsdale J., Knight J. C., Major histocompatibility complex genomics and human disease. Annu. Rev. Genomics Hum. Genet. 14, 301–323 (2013). - PMC - PubMed
    1. Okada Y. et al., Fine mapping major histocompatibility complex associations in psoriasis and its clinical subtypes. Am. J. Hum. Genet. 95, 162–172 (2014). - PMC - PubMed

Publication types