Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 16:15:1426173.
doi: 10.3389/fimmu.2024.1426173. eCollection 2024.

TCR-H: explainable machine learning prediction of T-cell receptor epitope binding on unseen datasets

Affiliations

TCR-H: explainable machine learning prediction of T-cell receptor epitope binding on unseen datasets

Rajitha Rajeshwar T et al. Front Immunol. .

Abstract

Artificial-intelligence and machine-learning (AI/ML) approaches to predicting T-cell receptor (TCR)-epitope specificity achieve high performance metrics on test datasets which include sequences that are also part of the training set but fail to generalize to test sets consisting of epitopes and TCRs that are absent from the training set, i.e., are 'unseen' during training of the ML model. We present TCR-H, a supervised classification Support Vector Machines model using physicochemical features trained on the largest dataset available to date using only experimentally validated non-binders as negative datapoints. TCR-H exhibits an area under the curve of the receiver-operator characteristic (AUC of ROC) of 0.87 for epitope 'hard splitting' (i.e., on test sets with all epitopes unseen during ML training), 0.92 for TCR hard splitting and 0.89 for 'strict splitting' in which neither the epitopes nor the TCRs in the test set are seen in the training data. Furthermore, we employ the SHAP (Shapley additive explanations) eXplainable AI (XAI) method for post hoc interrogation to interpret the models trained with different hard splits, shedding light on the key physiochemical features driving model predictions. TCR-H thus represents a significant step towards general applicability and explainability of epitope:TCR specificity prediction.

Keywords: T-cell receptor; adaptive immunity T-cell receptor; antigen; epitope; explainable machine learning; machine learning; physicochemical features; physicochemical model.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Schematic work flow of ML modeling. For each pair of TCR CDR3β and epitope sequences in the training data, labelled as binding and non-binding, physicochemical features are calculated and provided as input features for the different ML models tested. The trained models predict whether or not given CDR3β and epitope sequences of the test data bind. Models are evaluated based on a variety of performance metrics and are interpreted using SHAP analysis.
Figure 2
Figure 2
Performance metrics for epitope hard split test set of ML models. Random Forest (RF), Gradient Boosting trees (GBT), eXtreme Gradient Boosting (XGB), Support Vector Machines (SVM) and SVM with uncorrelated features (TCR-HE).
Figure 3
Figure 3
Performance metrics on the independent test sets for the TCR-HE (epitope hard split), TCR-Hβ (TCR hard split), TCR-HβE (strict split) and TCR-RS (random split).
Figure 4
Figure 4
AUC of ROC, Precision and Recall of TCR-HE compared with that of previously reported epitope hard split models (21, 23, 25, 36).
Figure 5
Figure 5
Representative summary plots of SHAP for Epitope split and TCR split showing top 50 features contributing to the model predictions. The length of the horizontal bar corresponding to each significant feature represents the magnitude of its SHAP value, while the color indicates the direction of its impact. Red bars denote higher predicted probabilities for the positive class, whereas blue bars represent the negative class. Longer bars against a feature indicate a greater impact on the model’s prediction.

References

    1. Dash P, Fiore-Gartland AJ, Hertz T, Wang GC, Sharma S, Souquette A, et al. . Quantifiable predictive features define epitope-specific T cell receptor repertoires. Nature. (2017) 547:89–93. doi: 10.1038/nature22383 - DOI - PMC - PubMed
    1. Bradley P, Thomas PG. Using T cell receptor repertoires to understand the principles of adaptive immune recognition. Annu Rev Immunol. (2019) 37:547–70. doi: 10.1146/annurev-immunol-042718-041757 - DOI - PubMed
    1. Rudolph MG, Stanfield RL, Wilson IA. How TCRs bind MHCs, peptides, and coreceptors. Annu Rev Immunol. (2006) 24:419–66. doi: 10.1146/annurev.immunol.23.021704.115658 - DOI - PubMed
    1. Tippalagama R, Chihab LY, Kearns K, Lewis S, Panda S, Willemsen L, et al. . Antigen-specificity measurements are the key to understanding T cell responses. Front Immunol. (2023) 14:1127470. doi: 10.3389/fimmu.2023.1127470 - DOI - PMC - PubMed
    1. Bradley P. Structure-based prediction of T cell receptor: peptide-MHC interactions. Elife. (2023) 12:e82813. doi: 10.7554/eLife.82813 - DOI - PMC - PubMed

MeSH terms

Substances

LinkOut - more resources