Hydrophobicity identifies false positives and false negatives in peptide-MHC binding

Arnav Solanki¹, Marc Riedel¹, James Cornette², Julia Udell^{1

3}, George Vasmatzis³

Affiliations

¹ Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, United States.
² Department of Mathematics, Iowa State University, Ames, IA, United States.
³ Biomarker Discovery Group, Mayo Clinic, Center for Individualized Medicine, Rochester, MN, United States.

PMID: 36419888
PMCID: PMC9677119
DOI: 10.3389/fonc.2022.1034810

Hydrophobicity identifies false positives and false negatives in peptide-MHC binding

Arnav Solanki et al. Front Oncol. 2022.

. 2022 Nov 7:12:1034810.

doi: 10.3389/fonc.2022.1034810. eCollection 2022.

Authors

Arnav Solanki¹, Marc Riedel¹, James Cornette², Julia Udell^{1

3}, George Vasmatzis³

Affiliations

¹ Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, United States.
² Department of Mathematics, Iowa State University, Ames, IA, United States.
³ Biomarker Discovery Group, Mayo Clinic, Center for Individualized Medicine, Rochester, MN, United States.

PMID: 36419888
PMCID: PMC9677119
DOI: 10.3389/fonc.2022.1034810

Abstract

Major Histocompability Complex (MHC) Class I molecules allow cells to present foreign and endogenous peptides to T-Cells so that cells infected by pathogens can be identified and killed. Neural networks tools such as NetMHC-4.0 and NetMHCpan-4.1 are used to predict whether peptides will bind to variants of MHC molecules. These tools are trained on data gathered from binding affinity and eluted ligand experiments. However, these tools do not track hydrophobicity, a significant biochemical factor relevant to peptide binding, in their predictions. A previous study had concluded that the peptides predicted to bind to HLA-A*0201 by NetMHC-4.0 were much more hydrophobic than expected. This paper expands that study by also focusing on HLA-B*2705 and HLA-B*0801, which prefer binding hydrophilic and balanced peptides respectively. The correlation of hydrophobicity of 9-mer peptides with their predicted binding strengths to these various HLAs was investigated. Two studies were performed, one using the data that the two neural networks were trained on, and the other using a sample of the human proteome. NetMHC-4.0 was found to have a statistically significant bias towards predicting highly hydrophobic peptides as strong binders to HLA-A*0201 and HLA-B*2705 in both studies. Machine Learning metrics were used to identify the causes for this bias: hydrophobic false positives and hydrophilic false negatives. These results suggest that the retraining the neural networks with biochemical attributes such as hydrophobicity and better training data could increase the accuracy of their predictions. This would increase their impact in applications such as vaccine design and neoantigen identification.

Keywords: MHC class I; hydrophobicity; machine learning; neural networks; peptide.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 1**
The cumulative distribution of the experimental training scores (blue), NetMHC-4.0 predicted scores (red), and NetMHCpan-4.1 predicted scores (yellow) for peptides in the training dataset for HLAs A2, B27, and B8. The strong binder thresholds for NetMHC-4.0 and NetMHCpan-4.1 are shown as dashed lines of the corresponding colors. For B27, these were 0.551 and 0.478, and for B8 these were 0.495 and 0.301 respectively. Each plot of scores was independently sorted. Consequently, the order of peptides is not conserved across the 3 plots in each subfigure. Note that the A2 results can be accessed from our previous study (15). For A2, the NetMHC-4.0 and NetMHCpan-4.1 thresholds were 0.659 and 0.419 respectively.

**Figure 2**
The cumulative distribution of NetMHC-4.0 predicted scores (red) and NetMHCpan-4.1 predicted scores (yellow) for peptides in the human proteome dataset for HLAs A2, B27, and B8. The strong binder thresholds for NetMHC-4.0 and NetMHCpan-4.1 are shown as dashed lines of the corresponding colors. These thresholds are the same as those in **Figure 1** . Each plot of scores was independently sorted. Consequently, the order of peptides is not conserved across the 2 plots in each subfigure. Note that the A2 results can be accessed from our previous study (15).

**Figure 3**
Violin plots of the hydrophobicity of the sets of strong binders predicted by NetMHC-4.0 and NetMHCpan-4.1 on the training dataset for A2, B27, and B8. The x-axis represents the hydrophobicity of a 9-mer, and the y-axis represents the frequency. Note that the A2 results can be accessed from our previous study (15). The mean and two quartiles are also depicted in each distribution.

**Figure 4**
Violin plots of the hydrophobicity of the sets of strong binders predicted by NetMHC-4.0 and NetMHCpan-4.1 on the human proteome dataset for A2, B27, and B8. The x-axis represents the hydrophobicity of a 9-mer, and the y-axis represents the frequency. The distributions of all sampled peptides (blue), strong binders predicted by NetMHC-4.0 (red), and those predicted by NetMHCpan-4.1 (yellow) are shown. The mean and two quartiles are also depicted in each distribution. Note that the A2 results can be accessed from our previous study (15).

See this image and copyright information in PMC

References

1. Neefjes J, Jongsma ML, Paul P, Bakke O. Towards a systems understanding of mhc class i and mhc class ii antigen presentation. Nat Rev Immunol (2011) 11:823–36. doi: 10.1038/nri3084 - DOI - PubMed
1. Gourraud PA, Khankhanian P, Cereb N, Yang SY, Feolo M, Maiers M, et al. . HLA diversity in the 1000 genomes dataset. PLoS One (2014) 9:e97282. doi: 10.1371/journal.pone.0097282 - DOI - PMC - PubMed
1. Andreatta M, Nielsen M. Gapped sequence alignment using artificial neural networks: application to the MHC class I system. Bioinformatics (2016) 32:511–7. doi: 10.1093/bioinformatics/btv639 - DOI - PMC - PubMed
1. Reynisson B, Alvarez B, Paul S, Peters B, Nielsen M. NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Res (2020) 48:W449–54. doi: 10.1093/nar/gkaa379 - DOI - PMC - PubMed
1. McGranahan N, Rosenthal R, Hiley CT, Rowan AJ, Watkins TB, Wilson GA, et al. . Allele-specific HLA loss and immune escape in lung cancer evolution. Cell (2017) 171:1259–71. doi: 10.1016/j.cell.2017.10.001 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Hydrophobicity identifies false positives and false negatives in peptide-MHC binding

Affiliations

Hydrophobicity identifies false positives and false negatives in peptide-MHC binding

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous