Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Jun 9;2(6):e65.
doi: 10.1371/journal.pcbi.0020065. Epub 2006 Jun 9.

A community resource benchmarking predictions of peptide binding to MHC-I molecules

Affiliations

A community resource benchmarking predictions of peptide binding to MHC-I molecules

Bjoern Peters et al. PLoS Comput Biol. .

Abstract

Recognition of peptides bound to major histocompatibility complex (MHC) class I molecules by T lymphocytes is an essential part of immune surveillance. Each MHC allele has a characteristic peptide binding preference, which can be captured in prediction algorithms, allowing for the rapid scan of entire pathogen proteomes for peptide likely to bind MHC. Here we make public a large set of 48,828 quantitative peptide-binding affinity measurements relating to 48 different mouse, human, macaque, and chimpanzee MHC class I alleles. We use this data to establish a set of benchmark predictions with one neural network method and two matrix-based prediction methods extensively utilized in our groups. In general, the neural network outperforms the matrix-based predictions mainly due to its ability to generalize even on a small amount of data. We also retrieved predictions from tools publicly available on the internet. While differences in the data used to generate these predictions hamper direct comparisons, we do conclude that tools based on combinatorial peptide libraries perform remarkably well. The transparent prediction evaluation on this dataset provides tool developers with a benchmark for comparison of newly developed prediction methods. In addition, to generate and evaluate our own prediction methods, we have established an easily extensible web-based prediction framework that allows automated side-by-side comparisons of prediction methods implemented by experts. This is an advance over the current practice of tool developers having to generate reference predictions themselves, which can lead to underestimating the performance of prediction methods they are not as familiar with as their own. The overall goal of this effort is to provide a transparent prediction evaluation allowing bioinformaticians to identify promising features of prediction methods and providing guidance to immunologists regarding the reliability of prediction tools.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Comparability of the Binding Affinities between Assays
(A) Scatter plot comparing measured affinities for peptides to MHC recorded in the Buus (y-axis) and Sette (x-axis) assay systems. (B) The agreement between experimental classifications of peptides as binders/nonbinders at different affinity thresholds (x-axis) is measured by the Matthews correlation coefficient (y-axis). The dashed lines indicates the IC50 = 500 nM cutoff commonly used for classifying peptides into binders and nonbinders, which is used in the ROC analysis.
Figure 2
Figure 2. ARB, SMM, and ANN Predictions for HLA-A*0201
The first three panels depict scatter plots of the predicted binding scores (x-axis) against the measured (y-axis) binding affinities of 3,089 9-mer peptides to HLA-A*0201. The predictions were obtained in five-fold cross-validation using the ARB/SMM/ANN prediction methods, respectively. In each plot, a linear regression on a logarithmic scale was performed, and the corresponding regression equation and r2 values are given. The bottom right panel contains an ROC analysis of the same data, evaluating how well the three methods can classify peptides into binders (IC50 < 500 nM) and nonbinders. The AUC, which evaluates prediction quality, is given for each method.
Figure 3
Figure 3. Prediction Performance as a Function of Training Set Size
For all datasets for which predictions with all three methods could be made, the AUC values obtained with the three prediction methods are included in the graph (y-axis). The x-axis gives the number of peptide affinities in each training set.
Figure 4
Figure 4. Syfpeithi and Bimas Predictions for HLA-A*0201
The top two panels contain scatter plots of the predicted binding scores (x-axis) against the measured binding affinities (y-axis) for all 3,089 9-mer peptides binding to HLA-A*0201 in our database. Both bimas and syfpeithi do not predict IC50 values, but have output scales in which high scores indicate good binding candidates. Therefore, the regression curves are inverted. The bottom panel contains an ROC analysis of the same data with the classification cutoff of 500 nM.
Figure 5
Figure 5. Scheme to Integrate Prediction Methods
Shown is a prediction framework providing a common interface to different prediction methods to generate new tools and retrieve predictions from them. A prediction method has to accept a set of peptides with measured affinities with which it can train a new prediction tool. It returns the URI of the new tool to the evaluation server. Using the URI, the evaluation server can check for the state of the new tool to see if training is still ongoing or if an error occurred during training. Once the tool training is completed, it has to accept a set of peptide sequences and return predicted affinities for them. The format for the data exchanged in each of these steps is defined in an xml schema definition (.xsd file), available at http://mhcbindingpredictions.immuneepitope.org.

References

    1. Shastri N, Schwab S, Serwold T. Producing nature's gene-chips: The generation of peptides for display by MHC class I molecules. Annu Rev Immunol. 2002;20:463–493. - PubMed
    1. Marincola FM, Wang E, Herlyn M, Seliger B, Ferrone S. Tumors as elusive targets of T-cell–based active immunotherapy. Trends Immunol. 2003;24:335–342. - PubMed
    1. Descamps FJ, Van den Steen PE, Nelissen I, Van Damme J, Opdenakker G. Remnant epitopes generate autoimmunity: From rheumatoid arthritis and multiple sclerosis to diabetes. Adv Exp Med Biol. 2003;535:69–77. - PubMed
    1. Bhasin M, Raghava GP. SVM based method for predicting HLA-DRB1*0401 binding peptides in an antigen sequence. Bioinformatics. 2004;20:421–423. - PubMed
    1. Bui HH, Sidney J, Peters B, Sathiamurthy M, Sinichi A, et al. Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix applications. Immunogenetics. 2005;57:304–314. - PubMed

Publication types