Naive and memory T cells TCR-HLA-binding prediction

Neta Glazer¹, Ofek Akerman¹, Yoram Louzoun¹

Affiliations

PMID: 36846560
PMCID: PMC9914496
DOI: 10.1093/oxfimm/iqac001

Naive and memory T cells TCR-HLA-binding prediction

Neta Glazer et al. Oxf Open Immunol. 2022.

. 2022 May 26;3(1):iqac001.

doi: 10.1093/oxfimm/iqac001. eCollection 2022.

Authors

Neta Glazer¹, Ofek Akerman¹, Yoram Louzoun¹

Affiliation

¹ Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel.

PMID: 36846560
PMCID: PMC9914496
DOI: 10.1093/oxfimm/iqac001

Abstract

T cells recognize antigens through the interaction of their T cell receptor (TCR) with a peptide-major histocompatibility complex (pMHC) molecule. Following thymic-positive selection, TCRs in peripheral naive T cells are expected to bind MHC alleles of the host. Peripheral clonal selection is expected to further increase the frequency of antigen-specific TCRs that bind to the host MHC alleles. To check for a systematic preference for MHC-binding T cells in TCR repertoires, we developed Natural Language Processing-based methods to predict TCR-MHC binding independently of the peptide presented for Class I MHC alleles. We trained a classifier on published TCR-pMHC binding pairs and obtained a high area under curve (AUC) of over 0.90 on the test set. However, when applied to TCR repertoires, the accuracy of the classifier dropped. We thus developed a two-stage prediction model, based on large-scale naive and memory TCR repertoires, denoted TCR HLA-binding predictor (CLAIRE). Since each host carries multiple human leukocyte antigen (HLA) alleles, we first computed whether a TCR on a CD8 T cell binds an MHC from any of the host Class-I HLA alleles. We then performed an iteration, where we predict the binding with the most probable allele from the first round. We show that this classifier is more precise for memory than for naïve cells. Moreover, it can be transferred between datasets. Finally, we developed a CD4-CD8 T cell classifier to apply CLAIRE to unsorted bulk sequencing datasets and showed a high AUC of 0.96 and 0.90 on large datasets. CLAIRE is available through a GitHub at: https://github.com/louzounlab/CLAIRE, and as a server at: https://claire.math.biu.ac.il/Home.

Keywords: HLA; MHC; TCR; binding prediction; machine learning; memory T-cell; receptor; two stages.

PubMed Disclaimer

Figures

**Figure 1:**
Models for the prediction of bindings between different entities in the TCR–MHC–peptide complex: MHC–peptide: The model by [15], Pssmhcman [16], NetMHC-4.0 [17], Anthem [18], MHCflurry [19], Netmhcpan-4.1 and netmhciipan-4.0 [20]. TCR–peptide: GLIPH [4], the model by [27], ERGO [28], NetTCR’s [29, 30] model, deepTCR [31], PUBLIC [32], the models presented by [33] and TCRex [34]. TCR–MHC: CLAIRE (our paper). Dash et al. [27] and DeWitt et al. [51] presented a correlation between TCR clusters and HLA association.

**Figure 2:**
Heatmap of correlation between $V β$ -genes *v_j* and HLA alleles *h_i*. For each *v_j*, *h_i* the value presented in the heatmap is $\log (P (h_{i} | v_{j}) + ϵ)$ , where $ϵ = 10^{- 5}$ . The V genes and HLA alleles are clustered using single link hierarchical clustering.

**Figure 3:**
Illustration of the CLAIRE’s architecture. We developed a binary model trained to output 1 if the TCR and HLA bind and 0 otherwise. The model had TCR and HLA allele inputs. For the TCR, we used the CDR3-beta chain sequence, the $V β$ and $J β$ genes. When α chain information was available, we used it too. The CDR3 amino acid chains were encoded with a TCR Autoencoder (see ‘Methods’ section). We considered the $V β$ and $J β$ genes as categorical features (see ‘Methods’ section for the TCR representation). The HLA was also a categorical feature. All features were embedded as real valued vectors *E_t* and *E_h*, respectively. All the T cell’s encoded features were concatenated with the HLA-encoded vector into one vector *E_th*, that was the input of a MLP. The output of the MLP is a real valued between 0 and 1 trained to be high for binding TCR–HLA pairs and low otherwise.

**Figure 4:**
Performance of the TCR–HLA-binding model, with datasets that contain pairs of TCR and HLA (*t_k*, *h_j*). The upper plots are the ROC curve for the prediction based on the McPAS and VDJdb datasets. One can clearly see that the accuracy is much higher using the McPAS dataset. The bar plot contains the results of the model trained in on McPAS dataset and tested on each of the HLAs in the same dataset. This model was trained on MHC Classes I and II present in the McPAS dataset. This is the only analysis where we used samples of TCRs that bind to MHC Class II. A clear difference can be observed between HLA with very high AUC values and many other with low AUC values.

**Figure 5:**
The bar plot in the upper right shows the results of CLAIRE (both GA and AUC) on different datasets. The upper left plot shows here the ROC curve of the model we trained on McPAS dataset, then tested on the original McPAS, and not on the simulated McPAS as explained in ‘Limitations of PTH models in BSE Data’ section. The lower bar plot shows the GA score results for the random scenario, compared to the GA that we get with CLAIRE. The dashed lines are the average GA results over 10 realization per sample size. The sample sizes are represented as a fraction of the total sample used for the analysis. The random realizations are the GA expected when the class of each sample is scrambled. The full regions around the dashed lines are the standard error. The full lines are the real GA obtained for the appropriate dataset.

**Figure 6:**
Comparison of CLAIRE results and a random model on TCRs tightly associated with the presence of specific HLA. The right bar represents the Recall of a random model, and the other bars represent the recall of each HLA. For each HLA, we took all the TCR that is associated with that HLA, and calculated how many TCRs got the highest probability to bind to that HLA. The asterisks represent the significance level of a Chi-square test between the results of a random model and CLAIRE. ****P <*0.001, **P<0.01, *P<0.05. The upper plot is for the Emerson data. The lower plot is for the van Heijst.

**Figure 7:**
The bar plot represents the AUC score of the CD4–CD8 model. The x-axis is the datasets we train the model on, and each color represents the datasets we tested the model on. The y-axis is the AUC score. This plot represents the results as in Table 5. The bottom left plot is the ROC curve of the CD4–CD8 model when we train it on McPAS dataset. The bottom right plot is the performance of CLAIRE on the Emerson dataset using only the T cells that are most likely to be CD8 T cells (top 0.01%), using the CD4–CD8 model.

**Figure A1:**
Results of CLAIRE. We show here a ROC curve of a perfect model under the conditions of CLAIRE, as explained in ‘Limitations of PTH models in BSE Data’ section. Next to the perfect ROC curve we demonstrate the ROC curves of CLAIRE when trained on different datasets. The first dataset is McPAS, contains binding pairs of a memory T cells, and a *HLA* gene. The second dataset is Miron dataset, contains memory T cells from 11 donors, 9 of them with HLA typing. The last one is van Heijst dataset, contains TCRs from 27 patients with HLA typing.

See this image and copyright information in PMC

Cited by

Learning predictive signatures of HLA type from T-cell repertoires.
Ruiz Ortega M, Pogorelyy MV, Minervina AA, Thomas PG, Mora T, Walczak AM. Ruiz Ortega M, et al. PLoS Comput Biol. 2025 Jan 6;21(1):e1012724. doi: 10.1371/journal.pcbi.1012724. eCollection 2025 Jan. PLoS Comput Biol. 2025. PMID: 39761303 Free PMC article.
Bw4 ligand and direct T-cell receptor binding induced selection on HLA A and B alleles.
Levi R, Levi L, Louzoun Y. Levi R, et al. Front Immunol. 2023 Nov 21;14:1236080. doi: 10.3389/fimmu.2023.1236080. eCollection 2023. Front Immunol. 2023. PMID: 38077375 Free PMC article.
Counting is almost all you need.
Akerman O, Isakov H, Levi R, Psevkin V, Louzoun Y. Akerman O, et al. Front Immunol. 2023 Jan 20;13:1031011. doi: 10.3389/fimmu.2022.1031011. eCollection 2022. Front Immunol. 2023. PMID: 36741395 Free PMC article.
Neural network models for sequence-based TCR and HLA association prediction.
Liu S, Bradley P, Sun W. Liu S, et al. PLoS Comput Biol. 2023 Nov 20;19(11):e1011664. doi: 10.1371/journal.pcbi.1011664. eCollection 2023 Nov. PLoS Comput Biol. 2023. PMID: 37983288 Free PMC article.

References

1. Davis MM, Bjorkman PJ. T-cell antigen receptor genes and T-cell recognition. Nature. 1988;334:395–402 - PubMed
1. Krogsgaard M, Davis MM.. How T cells’ see’antigen. Nat Immunol. 2005;6:239–45 - PubMed
1. Rowen L, Koop BF, Hood L.. The complete 685-kilobase DNA sequence of the human β T cell receptor locus. Science. 1996;272:1755–62 - PubMed
1. Glanville J, Huang H, Nau A. et al. Identifying specificity groups in the T cell receptor repertoire. Nature. 2017;547:94–8 - PMC - PubMed
1. Choo SY The HLA system: genetics, immunology, clinical testing, and clinical implications. Yonsei Med J. 2007;48:11–23 - PMC - PubMed

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Naive and memory T cells TCR-HLA-binding prediction

Affiliation

Naive and memory T cells TCR-HLA-binding prediction

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Research Materials