Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2026 Jan 13;9(1):50.
doi: 10.1038/s42003-025-09140-2.

T cell receptor clonotypes predict human leukocyte antigen allele carriage and antigen exposure history

Affiliations

T cell receptor clonotypes predict human leukocyte antigen allele carriage and antigen exposure history

Hesham ElAbd et al. Commun Biol. .

Abstract

Conventional T cells recognize peptides presented by the human leukocyte antigen (HLA) proteins through their T cell receptors (TCRs). Given that thousands of HLA proteins have been discovered, each presenting thousands of different peptides, decoding the cognate HLA protein of a TCR experimentally is a challenging task. To address this problem, we combined statistical learning methods with a unique dataset of paired T cell repertoires and HLA allotypes for 6,794 individuals. This enabled us to discover 34,206 T cell receptor alpha (TRA) and 891,564 beta (TRB) clonotypes that were associated with 175 unique HLA alleles. The identified clonotypes target prevalent infections, e.g. influenza, cytomegalovirus and Epstein-Barr virus. Utilizing these clonotypes, we develop statistical models that impute the carriership of common HLA alleles from the TRA- or the TRB- repertoire. In conclusion, the identified allele-associated clonotypes encode the HLA fingerprints and the antigenic exposure history of individuals and populations.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the approach used for discovering clonotypes restricted to different HLA proteins and for developing models to impute the carriership of these HLA alleles based on the TRA or the TRB repertoire.
a Shows the cohorts used in the current study to discover TRA- and TRB- clonotypes associated with different HLA alleles. b Summarizes the discovery of clonotypes associated with each allele by comparing their presences in carriers and non-carriers using the Fisher’s exact test followed by resolving linkage-disequilibrium (LD) using L1-regularised linear regression (L1LR)-models. c The machine-learning classifiers developed to predict the carriership of a given HLA allele using the cumulative weighted expansion of clonotypes that are associated with this HLA allele. d The pipeline for imputing HLA alleles from a given TRA or TRB repertoire, where for each of the supported allele-models we calculated the carriership probability. The final HLA-typing for a sample represents alleles with a carriership probability of 0.5 or more. Created in BioRender. ElAbd, H. (2025) https://BioRender.com/2gfo7ks.
Fig. 2
Fig. 2. Effect of HLA allele frequency on TRB-associated clonotypes and imputation model performance.
The relationship between HLA allele carriership frequency and the number of associated TRB clonotypes for the six classical HLA loci. HLA-A (a), HLA-B (b), HLA-C (c), HLA-DR (d), HLA-DQ (e), HLA-DP (f).“P.corr” denotes the Pearson correlation coefficient. For panel (d), HLA-DR alleles are written in the name of their corresponding HLA-DRB1 alleles because the alpha chain is invariant, hence, DR-07:01 represents the HLA-DR molecules whose beta-chain is encoded by the HLA-DRB1*07:01 allele. For panels (e, f) HLA allele names are written as the alpha chain allele + the beta chain allele, for example, DQ-01:01 + 05:01 represents the HLA molecules encoded by the HLA-DQA1*01:01 and the HLA-DQB1*05:01 alleles. Panels (gl), the relationship between HLA-allele carriership frequency and the performance of its TRB-based imputation model on a test dataset of 1111 TRB repertoires with linked HLA allotypes. Three performance metrics were used to evaluate the model performance, namely, balanced accuracy, recall and precision. g–i The performance of three HLA-I loci  models, namely, HLA-A, HLA-B, and HLA-C, respectively. Similarly, the performance of HLA-II molecules is illustrated in (j–l), with HLA-DR shown in (j), HLA-DQ in (k) and lastly, HLA-DP in (l). The data supporting panels (g–l) are provided in Supplementary data 1.
Fig. 3
Fig. 3. The performance of the TRB-based HLA imputation models on an independent test dataset obtained from Rosati et al..
a shows the balanced accuracy, while (b) the recall and (c) the precision across different HLA alleles belonging to different HLA loci. Across all panels, alleles with carriership frequency <5% (n < 12 samples) were excluded from the analysis. The data supporting panels (ac) are provided in Supplementary data 4.
Fig. 4
Fig. 4. The performance of the TRB-based HLA imputation models on an independent test dataset obtained from the immuneCODE dataset.
a shows the balanced accuracy, while (b) the recall and (c) the precision across different HLA alleles belonging to different HLA loci. Across all panels, alleles with carriership frequency <5% (n < 3 samples) were excluded from the analysis. The data supporting panels (ac) are provided in Supplementary data 5.
Fig. 5
Fig. 5. The performance of the developed TRA-based imputation models on a test dataset of paired TRA repertoires and HLA allotypes that was generated by Rosati et al..
a shows the balanced accuracy, while (b) the recall and (c) the precision across different HLA alleles belonging to different HLA loci. Across all panels, alleles with carriership frequency <5% (n < 12 samples) were excluded from the analysis. The data supporting panels (ac) are provided in Supplementary data 6.
Fig. 6
Fig. 6. Benchmarking the predictive performance of TCR2HLA against HLAGuessr using the immuneCODE dataset.
ac The performance of HLA-A models across three metrics, namely, balanced accuracy, precision and recall, respectively. df The performance across the three evaluation-metrics for HLA-B models, (gi) and (jl) the benchmarking results for HLA-C and HLA-DRB1 models, respectively. The supporting data is available in Supplementary data 7.
Fig. 7
Fig. 7. The antigenic specificity of HLA-associated TRA- and TRB-clonotypes.
(a) depicts the overlap between HLA-associated TRA clonotypes and public databases, namely, VDJdb and McPAS while (b) illustrates the overlap between these databases and HLA-associated TRB clonotypes. Network visualization was performed using Cytoscape.

References

    1. Raskov, H., Orhan, A., Christensen, J. P. & Gögenur, I. Cytotoxic CD8+ T cells in cancer and cancer immunotherapy. Br. J. Cancer. 124, 359–367 (2021). - PMC - PubMed
    1. Weisberg, S. P. et al. Tissue-Resident Memory T Cells Mediate Immune Homeostasis in the Human Pancreas through the PD-1/PD-L1 Pathway. Cell Rep.29, 3916–3932.e5 (2019). - DOI - PMC - PubMed
    1. Kjer-Nielsen, L. et al. MR1 presents microbial vitamin B metabolites to MAIT cells. Nature491, 717–723 (2012). - DOI - PubMed
    1. Beckman, E. M. et al. Recognition of a lipid antigen by CD1-restricted αβ + T cells. Nature372, 691–694 (1994). - DOI - PubMed
    1. Laplagne, C. et al. Self-activation of Vγ9Vδ2 T cells by exogenous phosphoantigens involves TCR and butyrophilins. Cell Mol. Immunol.18, 1861–1870 (2021). - DOI - PMC - PubMed

MeSH terms

Substances

LinkOut - more resources