. 2016 Mar 30;8(1):33.

doi: 10.1186/s13073-016-0288-x.

NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets

Morten Nielsen^{1

2}, Massimo Andreatta³

Affiliations

¹ Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, Buenos Aires, Argentina. mniel@cbs.dtu.dk.
² Center for Biological Sequence Analysis, Technical University of Denmark, Kgs. Lyngby, Denmark. mniel@cbs.dtu.dk.
³ Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, Buenos Aires, Argentina.

PMID: 27029192
PMCID: PMC4812631
DOI: 10.1186/s13073-016-0288-x

NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets

Morten Nielsen et al. Genome Med. 2016.

. 2016 Mar 30;8(1):33.

doi: 10.1186/s13073-016-0288-x.

Authors

Morten Nielsen^{1

2}, Massimo Andreatta³

Affiliations

¹ Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, Buenos Aires, Argentina. mniel@cbs.dtu.dk.
² Center for Biological Sequence Analysis, Technical University of Denmark, Kgs. Lyngby, Denmark. mniel@cbs.dtu.dk.
³ Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, Buenos Aires, Argentina.

PMID: 27029192
PMCID: PMC4812631
DOI: 10.1186/s13073-016-0288-x

Abstract

Background: Binding of peptides to MHC class I molecules (MHC-I) is essential for antigen presentation to cytotoxic T-cells.

Results: Here, we demonstrate how a simple alignment step allowing insertions and deletions in a pan-specific MHC-I binding machine-learning model enables combining information across both multiple MHC molecules and peptide lengths. This pan-allele/pan-length algorithm significantly outperforms state-of-the-art methods, and captures differences in the length profile of binders to different MHC molecules leading to increased accuracy for ligand identification. Using this model, we demonstrate that percentile ranks in contrast to affinity-based thresholds are optimal for ligand identification due to uniform sampling of the MHC space.

Conclusions: We have developed a neural network-based machine-learning algorithm leveraging information across multiple receptor specificities and ligand length scales, and demonstrated how this approach significantly improves the accuracy for prediction of peptide binding and identification of MHC ligands. The method is available at www.cbs.dtu.dk/services/NetMHCpan-3.0 .

PubMed Disclaimer

Figures

**Fig. 1**
Predictive performance on different peptide lengths for the allmer and 9mer predictive methods. The two methods were trained as described in the text. The predictive performance was measured in terms of Pearson’s correlation coefficient (PCC) and area under the ROC curve (AUC), the latter using a binding threshold of 500 nM. The allmer method significantly outperforms the 9mer approach on peptides of all lengths from 8 to 10 (binomial test excluding ties). **: p < 0.001, *: p < 0.05

**Fig. 2**
Predictive performance on different peptide lengths for the allmer and allmer-allele predictive methods. The two methods were trained as described in the text. The predictive performance was measured in terms of Pearson’s correlation coefficient (PCC) and area under the ROC curve (AUC), the latter using a binding threshold of 500 nM. The allmer method significantly outperforms the allmer-allele approach for peptides of length 9 and 10 (binomial test excluding ties). **: p <0.001, *: p <0.05

**Fig. 3**
Length preference for the allmer and 9mer prediction methods compared to the length preference in the SYFPEITHI data. Length profiles for the allmer and 9mer methods were estimated as described in the text. The SYFPEITHI length preference was estimated as the average over the allele-specific length preference of 24 MHC molecules characterized by 20 or more ligand data points

**Fig. 4**
Comparison of the predicted length profile for alleles characterized by no or limited peptide data of length different from nine amino acids. The distribution of predicted binders for the three alleles were characterized by relatively large data sets (>500 data points) with more than 99 % 9mers. Length profiles were estimated from the top 1 % of 1,000,000 random natural 8–11mer peptides using the allmer-allele (the method trained on allmer data in an allele-specific manner), and the allmer (the pan-specific method trained on allemer data) methods, respectively

**Fig. 5**
Rank analysis on the SYFPEITHI ligand benchmark. Binding to the restriction element was predicted for all 8–11mer peptides within the source proteins from the SYFPEITHI data set using the allmer and 9mer prediction methods, respectively. The percentage of identified ligands is plotted as a function of the percentage of top predicted binders from each source protein-ligand-MHC combination

**Fig. 6**
ROC curve analyses for the SYFPEITHI benchmark dataset. Binding to the restriction element was predicted for all unique 8–11mer peptides within the source proteins from the SYFPEITHI benchmark using the allmer method. Binding values were reported as binding affinity and percentile rank values as described in the text. ROC curves were calculated for each prediction value taking ligands as positives and all other peptides as negatives. The inset plot shows the information divergence value (ID) as a function of the percentage of peptides selected. The ID was calculated from the proportion of peptides with predicted restriction to each of the MHC molecules in the benchmark compared to the proportion expected by sampling at random

See this image and copyright information in PMC

References

1. Yewdell JW, Bennink JR. Immunodominance in major histocompatibility complex class I-restricted T lymphocyte responses. Annu Rev Immunol. 1999;17:51–88. doi: 10.1146/annurev.immunol.17.1.51. - DOI - PubMed
1. Hoof I, Peters B, Sidney J, Pedersen LE, Sette A, Lund O, et al. NetMHCpan, a method for MHC class I binding prediction beyond humans. Immunogenetics. 2009;61(1):1–13. doi: 10.1007/s00251-008-0341-z. - DOI - PMC - PubMed
1. Nielsen M, Lundegaard C, Blicher T, Lamberth K, Harndahl M, Justesen S, et al. NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence. PLoS ONE. 2007;2(8) doi: 10.1371/journal.pone.0000796. - DOI - PMC - PubMed
1. Robinson J, Halliwell JA, Hayhurst JD, Flicek P, Parham P, Marsh SG. The IPD and IMGT/HLA database: allele variant databases. Nucleic Acids Res. 2015;43(Database issue):D423–D431. doi: 10.1093/nar/gku1161. - DOI - PMC - PubMed
1. Trolle T, Metushi IG, Greenbaum JA, Kim Y, Sidney J, Lund O, et al. Automated benchmarking of peptide-MHC class I binding predictions. Bioinformatics. 2015;31(13):2174–2181. doi: 10.1093/bioinformatics/btv123. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Medical
- ClinicalTrials.gov
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets

Affiliations

NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Research Materials