TCR-H: explainable machine learning prediction of T-cell receptor epitope binding on unseen datasets

Rajitha Rajeshwar T^{1

2

3}, Omar N A Demerdash^{1

3}, Jeremy C Smith^{1

2

3}

Affiliations

¹ UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, TN, United States.
² Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN, United States.
³ Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States.

PMID: 39221256
PMCID: PMC11361934
DOI: 10.3389/fimmu.2024.1426173

TCR-H: explainable machine learning prediction of T-cell receptor epitope binding on unseen datasets

Rajitha Rajeshwar T et al. Front Immunol. 2024.

. 2024 Aug 16:15:1426173.

doi: 10.3389/fimmu.2024.1426173. eCollection 2024.

Authors

Rajitha Rajeshwar T^{1

2

3}, Omar N A Demerdash^{1

3}, Jeremy C Smith^{1

2

3}

Affiliations

¹ UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, TN, United States.
² Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN, United States.
³ Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States.

PMID: 39221256
PMCID: PMC11361934
DOI: 10.3389/fimmu.2024.1426173

Abstract

Artificial-intelligence and machine-learning (AI/ML) approaches to predicting T-cell receptor (TCR)-epitope specificity achieve high performance metrics on test datasets which include sequences that are also part of the training set but fail to generalize to test sets consisting of epitopes and TCRs that are absent from the training set, i.e., are 'unseen' during training of the ML model. We present TCR-H, a supervised classification Support Vector Machines model using physicochemical features trained on the largest dataset available to date using only experimentally validated non-binders as negative datapoints. TCR-H exhibits an area under the curve of the receiver-operator characteristic (AUC of ROC) of 0.87 for epitope 'hard splitting' (i.e., on test sets with all epitopes unseen during ML training), 0.92 for TCR hard splitting and 0.89 for 'strict splitting' in which neither the epitopes nor the TCRs in the test set are seen in the training data. Furthermore, we employ the SHAP (Shapley additive explanations) eXplainable AI (XAI) method for post hoc interrogation to interpret the models trained with different hard splits, shedding light on the key physiochemical features driving model predictions. TCR-H thus represents a significant step towards general applicability and explainability of epitope:TCR specificity prediction.

Keywords: T-cell receptor; adaptive immunity T-cell receptor; antigen; epitope; explainable machine learning; machine learning; physicochemical features; physicochemical model.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 1**
Schematic work flow of ML modeling. For each pair of TCR CDR3β and epitope sequences in the training data, labelled as binding and non-binding, physicochemical features are calculated and provided as input features for the different ML models tested. The trained models predict whether or not given CDR3β and epitope sequences of the test data bind. Models are evaluated based on a variety of performance metrics and are interpreted using SHAP analysis.

**Figure 2**
Performance metrics for epitope hard split test set of ML models. Random Forest (RF), Gradient Boosting trees (GBT), eXtreme Gradient Boosting (XGB), Support Vector Machines (SVM) and SVM with uncorrelated features (TCR-HE).

**Figure 3**
Performance metrics on the independent test sets for the TCR-HE (epitope hard split), TCR-Hβ (TCR hard split), TCR-HβE (strict split) and TCR-RS (random split).

**Figure 4**
AUC of ROC, Precision and Recall of TCR-HE compared with that of previously reported epitope hard split models (21, 23, 25, 36).

**Figure 5**
Representative summary plots of SHAP for Epitope split and TCR split showing top 50 features contributing to the model predictions. The length of the horizontal bar corresponding to each significant feature represents the magnitude of its SHAP value, while the color indicates the direction of its impact. Red bars denote higher predicted probabilities for the positive class, whereas blue bars represent the negative class. Longer bars against a feature indicate a greater impact on the model’s prediction.

See this image and copyright information in PMC

References

1. Dash P, Fiore-Gartland AJ, Hertz T, Wang GC, Sharma S, Souquette A, et al. Quantifiable predictive features define epitope-specific T cell receptor repertoires. Nature. (2017) 547:89–93. doi: 10.1038/nature22383 - DOI - PMC - PubMed
1. Bradley P, Thomas PG. Using T cell receptor repertoires to understand the principles of adaptive immune recognition. Annu Rev Immunol. (2019) 37:547–70. doi: 10.1146/annurev-immunol-042718-041757 - DOI - PubMed
1. Rudolph MG, Stanfield RL, Wilson IA. How TCRs bind MHCs, peptides, and coreceptors. Annu Rev Immunol. (2006) 24:419–66. doi: 10.1146/annurev.immunol.23.021704.115658 - DOI - PubMed
1. Tippalagama R, Chihab LY, Kearns K, Lewis S, Panda S, Willemsen L, et al. Antigen-specificity measurements are the key to understanding T cell responses. Front Immunol. (2023) 14:1127470. doi: 10.3389/fimmu.2023.1127470 - DOI - PMC - PubMed
1. Bradley P. Structure-based prediction of T cell receptor: peptide-MHC interactions. Elife. (2023) 12:e82813. doi: 10.7554/eLife.82813 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
- Frontiers Media SA
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

TCR-H: explainable machine learning prediction of T-cell receptor epitope binding on unseen datasets

Affiliations

TCR-H: explainable machine learning prediction of T-cell receptor epitope binding on unseen datasets

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources