. 2018 Nov 8;14(11):e1006457.

doi: 10.1371/journal.pcbi.1006457. eCollection 2018 Nov.

Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes

Weilong Zhao¹, Xinwei Sher¹

Affiliations

PMID: 30408041
PMCID: PMC6224037
DOI: 10.1371/journal.pcbi.1006457

Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes

Weilong Zhao et al. PLoS Comput Biol. 2018.

. 2018 Nov 8;14(11):e1006457.

doi: 10.1371/journal.pcbi.1006457. eCollection 2018 Nov.

Authors

Weilong Zhao¹, Xinwei Sher¹

Affiliation

¹ Global Research IT, Merck & Co., Inc., Boston, MA, United States of America.

PMID: 30408041
PMCID: PMC6224037
DOI: 10.1371/journal.pcbi.1006457

Abstract

A number of machine learning-based predictors have been developed for identifying immunogenic T-cell epitopes based on major histocompatibility complex (MHC) class I and II binding affinities. Rationally selecting the most appropriate tool has been complicated by the evolving training data and machine learning methods. Despite the recent advances made in generating high-quality MHC-eluted, naturally processed ligandome, the reliability of new predictors on these epitopes has yet to be evaluated. This study reports the latest benchmarking on an extensive set of MHC-binding predictors by using newly available, untested data of both synthetic and naturally processed epitopes. 32 human leukocyte antigen (HLA) class I and 24 HLA class II alleles are included in the blind test set. Artificial neural network (ANN)-based approaches demonstrated better performance than regression-based machine learning and structural modeling. Among the 18 predictors benchmarked, ANN-based mhcflurry and nn_align perform the best for MHC class I 9-mer and class II 15-mer predictions, respectively, on binding/non-binding classification (Area Under Curves = 0.911). NetMHCpan4 also demonstrated comparable predictive power. Our customization of mhcflurry to a pan-HLA predictor has achieved similar accuracy to NetMHCpan. The overall accuracy of these methods are comparable between 9-mer and 10-mer testing data. However, the top methods deliver low correlations between the predicted versus the experimental affinities for strong MHC binders. When used on naturally processed MHC-ligands, tools that have been trained on elution data (NetMHCpan4 and MixMHCpred) shows better accuracy than pure binding affinity predictor. The variability of false prediction rate is considerable among HLA types and datasets. Finally, structure-based predictor of Rosetta FlexPepDock is less optimal compared to the machine learning approaches. With our benchmarking of MHC-binding and MHC-elution predictors using a comprehensive metrics, a unbiased view for establishing best practice of T-cell epitope predictions is presented, facilitating future development of methods in immunogenomics.

PubMed Disclaimer

Conflict of interest statement

All authors are employed by Merck Co. & Inc.

Figures

**Fig 1. Binary classification (binder vs. non-binder) performance.**
(a) AUC of MHC-I binding epitope prediction tools. (b) ROC curves. IC50 = 500 nM was used as the cutoff for classifying experimentally measured epitopes. AUC was shown by box plot with upper and lower boundaries covering confidence level of 95%. (c) ROC curves enlarged for TPR between 0.7 and 1.0.

**Fig 2. Evaluation of mhcflurry_pan predictor.**
Comparison of prediction power, indicated by (a) AUC, (b) specificity of binders, and (c) specificity of strong binders, of 9mer-based mhcflurry with 9mer-based NetMHCpan4, 43mer-based testing HLA included (mhcflurry_pan) and testing HLA leave-one-out (mhcflurry_pan_LOO) pan-predictor. Each point represents one HLA type.

**Fig 3**
**Multiclass classification performance of MHC class I binding epitope prediction tools**: (a) VUS; (b) SPE; (c) SRCC; (d): R-squared of linear regression. IC50 thresholds of 50 nM and 500 nM were used to classifying experimental measurements between strong binder, weak binder, and non-binder. The box plots of VUS and SRCC show values covering 95% confidence level. Note that IC50 is not calculated in MixMHCpred.

**Fig 4. Comparison of prediction accuracy between 9-mer and 10-mer testing data.**
(a) ROC curves of 10-mer predictions with AUC value shown after each method. (b) Boxplots of AUC and SRCC calculated for 9-mer and 10-mer predictions, with each point representing a type I HLA allele. Significant levels were obtained by Wilcoxon test (*: p < 0.05; ns: p > 0.05).

**Fig 5. Binary classification performance of MHC-II binding epitope prediction tools.**
(a) AUC. (b) ROC curves. IC50 = 1000 nM was used as the cutoff for classifying experimentally measured epitopes. AUC was shown by box plot with upper and lower boundaries covering confidence level of 95%. (c) ROC curves enlarged for TPR between 0.7 and 1.0.

**Fig 6**
**Multiclass classification performance of MHC-II binding epitope prediction tools**: (a) VUS; (b) SPE; (c) SRCC; (d): R-squared of linear regression. The box plots of VUS and SRCC show values covering 95% confidence level.

**Fig 7**
**Reliability of predicting absolute affinities of strong binding MHC Class I and II epitopes for (a) NetMHC4, (b) mhcflurry-class I, (c) nn_align, and (d) mhcflurry-class II.** Measurement and prediction values were represented as *1-log10(IC50)/log10(50000 nM)* and were light-colored based on 2-D data density. Grey dotted lines mark 50 nM threshold (y = 0.638) and grey dashed lines mark 500 nM (y = 0.426) threshold. Red lines show the linear regression of the data. *FNr(50 nM)* indicates the false negative rate of classifying strong binders.

**Fig 8. Assessing the reliability of binding prediction methods for the identification of naturally processed MHC-epitopes.**
(a) Box plots showing the quartile distribution of binding affinity rankings as predicted by NetMHC4 and NetMHCpan4 for MHC-eluted and non-eluted peptides. Grey dashed line indicates predicted binder ranking of top 2%. (b) FDr and FNr values calculated based on top 2% percentile rank cutoff on three MS-derived datasets visualized as heatmap. Similar plots of predicted scores are shown in S4 Fig.

**Fig 9. Prediction of MHC class I epitopes by FlexPepDock.**
(a) ROC curves and AUC values (shown after allele legend) generated based on reweighted binding energy scores reported by FlexPepDock. Peptides were labeled as positive or negative class by the IC50 = 500 nM cutoff. True positive and false positive were then calculated by correlating with FlexPepDock reweighted score. (b) and (c) Lowest energy conformations of 9-mers FLGGTPVCL and FLSHDFTLV to HLA-A0201 protein. MHC proteins are shown by orange ribbon and white surface; peptide backbones are shown in cyan; MHC-binding residues are shown in silver; potential T cell receptor contacting residues are shown in pink.

See this image and copyright information in PMC

Cited by

Ranking-Based Convolutional Neural Network Models for Peptide-MHC Class I Binding Prediction.
Chen Z, Min MR, Ning X. Chen Z, et al. Front Mol Biosci. 2021 May 17;8:634836. doi: 10.3389/fmolb.2021.634836. eCollection 2021. Front Mol Biosci. 2021. PMID: 34079815 Free PMC article.
Immunogenic SARS-CoV-2 Epitopes: In Silico Study Towards Better Understanding of COVID-19 Disease-Paving the Way for Vaccine Development.
Ranga V, Niemelä E, Tamirat MZ, Eriksson JE, Airenne TT, Johnson MS. Ranga V, et al. Vaccines (Basel). 2020 Jul 23;8(3):408. doi: 10.3390/vaccines8030408. Vaccines (Basel). 2020. PMID: 32717854 Free PMC article.
Establishment of a novel tumor neoantigen prediction tool for personalized vaccine design.
Xin K, Wei X, Shao J, Chen F, Liu Q, Liu B. Xin K, et al. Hum Vaccin Immunother. 2024 Dec 31;20(1):2300881. doi: 10.1080/21645515.2023.2300881. Epub 2024 Jan 12. Hum Vaccin Immunother. 2024. PMID: 38214336 Free PMC article.
A Universal Antigen-Ranking Method to Design Personalized Vaccines Targeting Neoantigens against Melanoma.
Malaina I, Martínez L, Montoya JM, Alonso S, Boyano MD, Asumendi A, Izu R, Sanchez-Diez A, Cancho-Galan G, M de la Fuente I. Malaina I, et al. Life (Basel). 2023 Jan 5;13(1):155. doi: 10.3390/life13010155. Life (Basel). 2023. PMID: 36676104 Free PMC article.
Relationship between HLA-DPA1 genetic polymorphism and anembryonic pregnancy.
Wang Z, Lu X, Yao X, Liu X, Zhao L, Chang S, Zhang T, Niu B, Wang L. Wang Z, et al. Mol Genet Genomic Med. 2020 Jan;8(1):e1046. doi: 10.1002/mgg3.1046. Epub 2019 Nov 30. Mol Genet Genomic Med. 2020. PMID: 31785132 Free PMC article.

See all "Cited by" articles

References

1. Blank CU, Haanen JB, Ribas A, Schumacher TN. CANCER IMMUNOLOGY. The cancer immunogram. Science. 2016;352(6286):658–60. 10.1126/science.aaf2834 - DOI - PubMed
1. Liu XS, Mardis ER. Applications of Immunogenomics to Cancer. Cell. 2017;168(4):600–12. 10.1016/j.cell.2017.01.014 - DOI - PMC - PubMed
1. Tang H, Tsarevsky N V. Preparation and functionalization of linear and reductively degradable highly branched cyanoacrylate-based polymers. J Polym Sci Part A Polym Chem. 2016. December 1;54(23):3683–93.
1. Garstka M a, Fish A, Celie PHN, Joosten RP, Janssen GMC, Berlin I, et al. The first step of peptide selection in antigen presentation by MHC class I molecules. Proc Natl Acad Sci U S A. 2015;112(5):1505–10. 10.1073/pnas.1416543112 - DOI - PMC - PubMed
1. Hackl H, Charoentong P, Finotello F, Trajanoski Z. Computational genomics tools for dissecting tumour-immune cell interactions. Nat Rev Genet. 2016;17(8):441–58. 10.1038/nrg.2016.67 - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes

Affiliation

Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous