A community resource benchmarking predictions of peptide binding to MHC-I molecules

Bjoern Peters¹, Huynh-Hoa Bui, Sune Frankild, Morten Nielson, Claus Lundegaard, Emrah Kostem, Derek Basch, Kasper Lamberth, Mikkel Harndahl, Ward Fleri, Stephen S Wilson, John Sidney, Ole Lund, Soren Buus, Alessandro Sette

Affiliations

PMID: 16789818
PMCID: PMC1475712
DOI: 10.1371/journal.pcbi.0020065

A community resource benchmarking predictions of peptide binding to MHC-I molecules

Bjoern Peters et al. PLoS Comput Biol. 2006.

. 2006 Jun 9;2(6):e65.

doi: 10.1371/journal.pcbi.0020065. Epub 2006 Jun 9.

Authors

Affiliation

¹ La Jolla Institute for Allergy and Immunology, San Diego, California, USA. bpeters@liai.org

PMID: 16789818
PMCID: PMC1475712
DOI: 10.1371/journal.pcbi.0020065

Abstract

Recognition of peptides bound to major histocompatibility complex (MHC) class I molecules by T lymphocytes is an essential part of immune surveillance. Each MHC allele has a characteristic peptide binding preference, which can be captured in prediction algorithms, allowing for the rapid scan of entire pathogen proteomes for peptide likely to bind MHC. Here we make public a large set of 48,828 quantitative peptide-binding affinity measurements relating to 48 different mouse, human, macaque, and chimpanzee MHC class I alleles. We use this data to establish a set of benchmark predictions with one neural network method and two matrix-based prediction methods extensively utilized in our groups. In general, the neural network outperforms the matrix-based predictions mainly due to its ability to generalize even on a small amount of data. We also retrieved predictions from tools publicly available on the internet. While differences in the data used to generate these predictions hamper direct comparisons, we do conclude that tools based on combinatorial peptide libraries perform remarkably well. The transparent prediction evaluation on this dataset provides tool developers with a benchmark for comparison of newly developed prediction methods. In addition, to generate and evaluate our own prediction methods, we have established an easily extensible web-based prediction framework that allows automated side-by-side comparisons of prediction methods implemented by experts. This is an advance over the current practice of tool developers having to generate reference predictions themselves, which can lead to underestimating the performance of prediction methods they are not as familiar with as their own. The overall goal of this effort is to provide a transparent prediction evaluation allowing bioinformaticians to identify promising features of prediction methods and providing guidance to immunologists regarding the reliability of prediction tools.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

**Figure 1. Comparability of the Binding Affinities between Assays**
(A) Scatter plot comparing measured affinities for peptides to MHC recorded in the Buus (y-axis) and Sette (x-axis) assay systems. (B) The agreement between experimental classifications of peptides as binders/nonbinders at different affinity thresholds (x-axis) is measured by the Matthews correlation coefficient (y-axis). The dashed lines indicates the IC₅₀ = 500 nM cutoff commonly used for classifying peptides into binders and nonbinders, which is used in the ROC analysis.

**Figure 2. ARB, SMM, and ANN Predictions for HLA-A*0201**
The first three panels depict scatter plots of the predicted binding scores (x-axis) against the measured (y-axis) binding affinities of 3,089 9-mer peptides to HLA-A*0201. The predictions were obtained in five-fold cross-validation using the ARB/SMM/ANN prediction methods, respectively. In each plot, a linear regression on a logarithmic scale was performed, and the corresponding regression equation and r² values are given. The bottom right panel contains an ROC analysis of the same data, evaluating how well the three methods can classify peptides into binders (IC₅₀ < 500 nM) and nonbinders. The AUC, which evaluates prediction quality, is given for each method.

**Figure 3. Prediction Performance as a Function of Training Set Size**
For all datasets for which predictions with all three methods could be made, the AUC values obtained with the three prediction methods are included in the graph (y-axis). The x-axis gives the number of peptide affinities in each training set.

**Figure 4. Syfpeithi and Bimas Predictions for HLA-A*0201**
The top two panels contain scatter plots of the predicted binding scores (x-axis) against the measured binding affinities (y-axis) for all 3,089 9-mer peptides binding to HLA-A*0201 in our database. Both bimas and syfpeithi do not predict IC₅₀ values, but have output scales in which high scores indicate good binding candidates. Therefore, the regression curves are inverted. The bottom panel contains an ROC analysis of the same data with the classification cutoff of 500 nM.

**Figure 5. Scheme to Integrate Prediction Methods**
Shown is a prediction framework providing a common interface to different prediction methods to generate new tools and retrieve predictions from them. A prediction method has to accept a set of peptides with measured affinities with which it can train a new prediction tool. It returns the URI of the new tool to the evaluation server. Using the URI, the evaluation server can check for the state of the new tool to see if training is still ongoing or if an error occurred during training. Once the tool training is completed, it has to accept a set of peptide sequences and return predicted affinities for them. The format for the data exchanged in each of these steps is defined in an xml schema definition (.xsd file), available at http://mhcbindingpredictions.immuneepitope.org.

See this image and copyright information in PMC

References

1. Shastri N, Schwab S, Serwold T. Producing nature's gene-chips: The generation of peptides for display by MHC class I molecules. Annu Rev Immunol. 2002;20:463–493. - PubMed
1. Marincola FM, Wang E, Herlyn M, Seliger B, Ferrone S. Tumors as elusive targets of T-cell–based active immunotherapy. Trends Immunol. 2003;24:335–342. - PubMed
1. Descamps FJ, Van den Steen PE, Nelissen I, Van Damme J, Opdenakker G. Remnant epitopes generate autoimmunity: From rheumatoid arthritis and multiple sclerosis to diabetes. Adv Exp Med Biol. 2003;535:69–77. - PubMed
1. Bhasin M, Raghava GP. SVM based method for predicting HLA-DRB1*0401 binding peptides in an antigen sequence. Bioinformatics. 2004;20:421–423. - PubMed
1. Bui HH, Sidney J, Peters B, Sathiamurthy M, Sinichi A, et al. Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix applications. Immunogenetics. 2005;57:304–314. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A community resource benchmarking predictions of peptide binding to MHC-I molecules

Affiliation

A community resource benchmarking predictions of peptide binding to MHC-I molecules

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials