Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 26;3(1):iqac001.
doi: 10.1093/oxfimm/iqac001. eCollection 2022.

Naive and memory T cells TCR-HLA-binding prediction

Affiliations

Naive and memory T cells TCR-HLA-binding prediction

Neta Glazer et al. Oxf Open Immunol. .

Abstract

T cells recognize antigens through the interaction of their T cell receptor (TCR) with a peptide-major histocompatibility complex (pMHC) molecule. Following thymic-positive selection, TCRs in peripheral naive T cells are expected to bind MHC alleles of the host. Peripheral clonal selection is expected to further increase the frequency of antigen-specific TCRs that bind to the host MHC alleles. To check for a systematic preference for MHC-binding T cells in TCR repertoires, we developed Natural Language Processing-based methods to predict TCR-MHC binding independently of the peptide presented for Class I MHC alleles. We trained a classifier on published TCR-pMHC binding pairs and obtained a high area under curve (AUC) of over 0.90 on the test set. However, when applied to TCR repertoires, the accuracy of the classifier dropped. We thus developed a two-stage prediction model, based on large-scale naive and memory TCR repertoires, denoted TCR HLA-binding predictor (CLAIRE). Since each host carries multiple human leukocyte antigen (HLA) alleles, we first computed whether a TCR on a CD8 T cell binds an MHC from any of the host Class-I HLA alleles. We then performed an iteration, where we predict the binding with the most probable allele from the first round. We show that this classifier is more precise for memory than for naïve cells. Moreover, it can be transferred between datasets. Finally, we developed a CD4-CD8 T cell classifier to apply CLAIRE to unsorted bulk sequencing datasets and showed a high AUC of 0.96 and 0.90 on large datasets. CLAIRE is available through a GitHub at: https://github.com/louzounlab/CLAIRE, and as a server at: https://claire.math.biu.ac.il/Home.

Keywords: HLA; MHC; TCR; binding prediction; machine learning; memory T-cell; receptor; two stages.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Models for the prediction of bindings between different entities in the TCR–MHC–peptide complex: MHC–peptide: The model by [15], Pssmhcman [16], NetMHC-4.0 [17], Anthem [18], MHCflurry [19], Netmhcpan-4.1 and netmhciipan-4.0 [20]. TCR–peptide: GLIPH [4], the model by [27], ERGO [28], NetTCR’s [29, 30] model, deepTCR [31], PUBLIC [32], the models presented by [33] and TCRex [34]. TCR–MHC: CLAIRE (our paper). Dash et al. [27] and DeWitt et al. [51] presented a correlation between TCR clusters and HLA association.
Figure 2:
Figure 2:
Heatmap of correlation between Vβ-genes vj and HLA alleles hi. For each vj, hi the value presented in the heatmap is log(P(hi|vj)+ϵ), where ϵ=105. The V genes and HLA alleles are clustered using single link hierarchical clustering.
Figure 3:
Figure 3:
Illustration of the CLAIRE’s architecture. We developed a binary model trained to output 1 if the TCR and HLA bind and 0 otherwise. The model had TCR and HLA allele inputs. For the TCR, we used the CDR3-beta chain sequence, the Vβ and Jβ genes. When α chain information was available, we used it too. The CDR3 amino acid chains were encoded with a TCR Autoencoder (see ‘Methods’ section). We considered the Vβ and Jβ genes as categorical features (see ‘Methods’ section for the TCR representation). The HLA was also a categorical feature. All features were embedded as real valued vectors Et and Eh, respectively. All the T cell’s encoded features were concatenated with the HLA-encoded vector into one vector Eth, that was the input of a MLP. The output of the MLP is a real valued between 0 and 1 trained to be high for binding TCR–HLA pairs and low otherwise.
Figure 4:
Figure 4:
Performance of the TCR–HLA-binding model, with datasets that contain pairs of TCR and HLA (tk, hj). The upper plots are the ROC curve for the prediction based on the McPAS and VDJdb datasets. One can clearly see that the accuracy is much higher using the McPAS dataset. The bar plot contains the results of the model trained in on McPAS dataset and tested on each of the HLAs in the same dataset. This model was trained on MHC Classes I and II present in the McPAS dataset. This is the only analysis where we used samples of TCRs that bind to MHC Class II. A clear difference can be observed between HLA with very high AUC values and many other with low AUC values.
Figure 5:
Figure 5:
The bar plot in the upper right shows the results of CLAIRE (both GA and AUC) on different datasets. The upper left plot shows here the ROC curve of the model we trained on McPAS dataset, then tested on the original McPAS, and not on the simulated McPAS as explained in ‘Limitations of PTH models in BSE Data’ section. The lower bar plot shows the GA score results for the random scenario, compared to the GA that we get with CLAIRE. The dashed lines are the average GA results over 10 realization per sample size. The sample sizes are represented as a fraction of the total sample used for the analysis. The random realizations are the GA expected when the class of each sample is scrambled. The full regions around the dashed lines are the standard error. The full lines are the real GA obtained for the appropriate dataset.
Figure 6:
Figure 6:
Comparison of CLAIRE results and a random model on TCRs tightly associated with the presence of specific HLA. The right bar represents the Recall of a random model, and the other bars represent the recall of each HLA. For each HLA, we took all the TCR that is associated with that HLA, and calculated how many TCRs got the highest probability to bind to that HLA. The asterisks represent the significance level of a Chi-square test between the results of a random model and CLAIRE. ***P <0.001, **P<0.01, *P<0.05. The upper plot is for the Emerson data. The lower plot is for the van Heijst.
Figure 7:
Figure 7:
The bar plot represents the AUC score of the CD4–CD8 model. The x-axis is the datasets we train the model on, and each color represents the datasets we tested the model on. The y-axis is the AUC score. This plot represents the results as in Table 5. The bottom left plot is the ROC curve of the CD4–CD8 model when we train it on McPAS dataset. The bottom right plot is the performance of CLAIRE on the Emerson dataset using only the T cells that are most likely to be CD8 T cells (top 0.01%), using the CD4–CD8 model.
Figure A1:
Figure A1:
Results of CLAIRE. We show here a ROC curve of a perfect model under the conditions of CLAIRE, as explained in ‘Limitations of PTH models in BSE Data’ section. Next to the perfect ROC curve we demonstrate the ROC curves of CLAIRE when trained on different datasets. The first dataset is McPAS, contains binding pairs of a memory T cells, and a HLA gene. The second dataset is Miron dataset, contains memory T cells from 11 donors, 9 of them with HLA typing. The last one is van Heijst dataset, contains TCRs from 27 patients with HLA typing.

Similar articles

Cited by

References

    1. Davis MM, Bjorkman PJ. T-cell antigen receptor genes and T-cell recognition. Nature. 1988;334:395–402 - PubMed
    1. Krogsgaard M, Davis MM.. How T cells’ see’antigen. Nat Immunol. 2005;6:239–45 - PubMed
    1. Rowen L, Koop BF, Hood L.. The complete 685-kilobase DNA sequence of the human β T cell receptor locus. Science. 1996;272:1755–62 - PubMed
    1. Glanville J, Huang H, Nau A. et al. Identifying specificity groups in the T cell receptor repertoire. Nature. 2017;547:94–8 - PMC - PubMed
    1. Choo SY The HLA system: genetics, immunology, clinical testing, and clinical implications. Yonsei Med J. 2007;48:11–23 - PMC - PubMed

LinkOut - more resources