. 2023 Jul 28;24(1):301.

doi: 10.1186/s12859-023-05421-x.

StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens

Phasit Charoenkwan¹, Nalini Schaduangrat², Watshara Shoombuatong³

Affiliations

¹ Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand.
² Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
³ Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand. watshara.sho@mahidol.ac.th.

PMID: 37507654
PMCID: PMC10386778
DOI: 10.1186/s12859-023-05421-x

StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens

Phasit Charoenkwan et al. BMC Bioinformatics. 2023.

. 2023 Jul 28;24(1):301.

doi: 10.1186/s12859-023-05421-x.

Authors

Phasit Charoenkwan¹, Nalini Schaduangrat², Watshara Shoombuatong³

Affiliations

¹ Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand.
² Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
³ Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand. watshara.sho@mahidol.ac.th.

PMID: 37507654
PMCID: PMC10386778
DOI: 10.1186/s12859-023-05421-x

Abstract

Background: The identification of tumor T cell antigens (TTCAs) is crucial for providing insights into their functional mechanisms and utilizing their potential in anticancer vaccines development. In this context, TTCAs are highly promising. Meanwhile, experimental technologies for discovering and characterizing new TTCAs are expensive and time-consuming. Although many machine learning (ML)-based models have been proposed for identifying new TTCAs, there is still a need to develop a robust model that can achieve higher rates of accuracy and precision.

Results: In this study, we propose a new stacking ensemble learning-based framework, termed StackTTCA, for accurate and large-scale identification of TTCAs. Firstly, we constructed 156 different baseline models by using 12 different feature encoding schemes and 13 popular ML algorithms. Secondly, these baseline models were trained and employed to create a new probabilistic feature vector. Finally, the optimal probabilistic feature vector was determined based the feature selection strategy and then used for the construction of our stacked model. Comparative benchmarking experiments indicated that StackTTCA clearly outperformed several ML classifiers and the existing methods in terms of the independent test, with an accuracy of 0.932 and Matthew's correlation coefficient of 0.866.

Conclusions: In summary, the proposed stacking ensemble learning-based framework of StackTTCA could help to precisely and rapidly identify true TTCAs for follow-up experimental verification. In addition, we developed an online web server ( http://2pmlab.camt.cmu.ac.th/StackTTCA ) to maximize user convenience for high-throughput screening of novel TTCAs.

Keywords: Bioinformatics; Feature selection; Machine learning; Stacking strategy; T-cell antigen.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
The overall workflow of our proposed approach StackTTCA, which includes five major steps: (i) datasets collection, (ii) baseline model construction, (iii) meta-classifier development, (iv) performance evaluation, and (v) web server deployment

**Fig. 2**
MCC values of 156 baseline models in terms of tenfold cross-validation (A) and independent (B) tests

**Fig. 3**
Confusion matrices of StackTTCA and top five ML classifiers in terms of the independent test dataset. ADA-CTD (A), RF-CTD (B), ET-CTD (C), LGBM-CTD (D), XGB-CTD (E), StackTTCA (F)

**Fig. 4**
t-distributed stochastic neighbor embedding (t-SNE) distribution of positive and negative samples on the training dataset, where TTCAs and non-TTCAs are represented with red and blue dots, respectively. ADA-CTD (A), RF-CTD (B), ET-CTD (C), LGBM-CTD (D), XGB-CTD (E), StackTTCA (F)

**Fig. 5**
Heat-map of the prediction performance of StackTTCA and the state-of-the-art methods in terms of the independent test dataset

**Fig. 6**
Feature importance from StackTTCA, where positive and negative SHAP values indicate the high probability that the prediction outputs are TTCA and non-TTCA, respectively

See this image and copyright information in PMC

References

1. Ilyas S, Yang JC. Landscape of tumor antigens in T cell immunotherapy. J Immunol. 2015;195(11):5117–5122. doi: 10.4049/jimmunol.1501657. - DOI - PMC - PubMed
1. Zamora AE, Crawford JC, Thomas PG. Hitting the target: how T cells detect and eliminate tumors. J Immunol. 2018;200(2):392–399. doi: 10.4049/jimmunol.1701413. - DOI - PMC - PubMed
1. Zhang L, Huang Y, Lindstrom AR, Lin T-Y, Lam KS, Li Y. Peptide-based materials for cancer immunotherapy. Theranostics. 2019;9(25):7807. doi: 10.7150/thno.37194. - DOI - PMC - PubMed
1. Vermaelen K. Vaccine strategies to improve anti-cancer cellular immune responses. Front Immunol. 2019;10:8. doi: 10.3389/fimmu.2019.00008. - DOI - PMC - PubMed
1. Alspach E, et al. MHC-II neoantigens shape tumour immunity and response to immunotherapy. Nature. 2019;574(7780):696–701. doi: 10.1038/s41586-019-1671-8. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

N42A660380/National Research Council of Thailand and Mahidol University

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens

Affiliations

StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical