Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 7:12:1034810.
doi: 10.3389/fonc.2022.1034810. eCollection 2022.

Hydrophobicity identifies false positives and false negatives in peptide-MHC binding

Affiliations

Hydrophobicity identifies false positives and false negatives in peptide-MHC binding

Arnav Solanki et al. Front Oncol. .

Abstract

Major Histocompability Complex (MHC) Class I molecules allow cells to present foreign and endogenous peptides to T-Cells so that cells infected by pathogens can be identified and killed. Neural networks tools such as NetMHC-4.0 and NetMHCpan-4.1 are used to predict whether peptides will bind to variants of MHC molecules. These tools are trained on data gathered from binding affinity and eluted ligand experiments. However, these tools do not track hydrophobicity, a significant biochemical factor relevant to peptide binding, in their predictions. A previous study had concluded that the peptides predicted to bind to HLA-A*0201 by NetMHC-4.0 were much more hydrophobic than expected. This paper expands that study by also focusing on HLA-B*2705 and HLA-B*0801, which prefer binding hydrophilic and balanced peptides respectively. The correlation of hydrophobicity of 9-mer peptides with their predicted binding strengths to these various HLAs was investigated. Two studies were performed, one using the data that the two neural networks were trained on, and the other using a sample of the human proteome. NetMHC-4.0 was found to have a statistically significant bias towards predicting highly hydrophobic peptides as strong binders to HLA-A*0201 and HLA-B*2705 in both studies. Machine Learning metrics were used to identify the causes for this bias: hydrophobic false positives and hydrophilic false negatives. These results suggest that the retraining the neural networks with biochemical attributes such as hydrophobicity and better training data could increase the accuracy of their predictions. This would increase their impact in applications such as vaccine design and neoantigen identification.

Keywords: MHC class I; hydrophobicity; machine learning; neural networks; peptide.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
The cumulative distribution of the experimental training scores (blue), NetMHC-4.0 predicted scores (red), and NetMHCpan-4.1 predicted scores (yellow) for peptides in the training dataset for HLAs A2, B27, and B8. The strong binder thresholds for NetMHC-4.0 and NetMHCpan-4.1 are shown as dashed lines of the corresponding colors. For B27, these were 0.551 and 0.478, and for B8 these were 0.495 and 0.301 respectively. Each plot of scores was independently sorted. Consequently, the order of peptides is not conserved across the 3 plots in each subfigure. Note that the A2 results can be accessed from our previous study (15). For A2, the NetMHC-4.0 and NetMHCpan-4.1 thresholds were 0.659 and 0.419 respectively.
Figure 2
Figure 2
The cumulative distribution of NetMHC-4.0 predicted scores (red) and NetMHCpan-4.1 predicted scores (yellow) for peptides in the human proteome dataset for HLAs A2, B27, and B8. The strong binder thresholds for NetMHC-4.0 and NetMHCpan-4.1 are shown as dashed lines of the corresponding colors. These thresholds are the same as those in Figure 1 . Each plot of scores was independently sorted. Consequently, the order of peptides is not conserved across the 2 plots in each subfigure. Note that the A2 results can be accessed from our previous study (15).
Figure 3
Figure 3
Violin plots of the hydrophobicity of the sets of strong binders predicted by NetMHC-4.0 and NetMHCpan-4.1 on the training dataset for A2, B27, and B8. The x-axis represents the hydrophobicity of a 9-mer, and the y-axis represents the frequency. Note that the A2 results can be accessed from our previous study (15). The mean and two quartiles are also depicted in each distribution.
Figure 4
Figure 4
Violin plots of the hydrophobicity of the sets of strong binders predicted by NetMHC-4.0 and NetMHCpan-4.1 on the human proteome dataset for A2, B27, and B8. The x-axis represents the hydrophobicity of a 9-mer, and the y-axis represents the frequency. The distributions of all sampled peptides (blue), strong binders predicted by NetMHC-4.0 (red), and those predicted by NetMHCpan-4.1 (yellow) are shown. The mean and two quartiles are also depicted in each distribution. Note that the A2 results can be accessed from our previous study (15).

References

    1. Neefjes J, Jongsma ML, Paul P, Bakke O. Towards a systems understanding of mhc class i and mhc class ii antigen presentation. Nat Rev Immunol (2011) 11:823–36. doi: 10.1038/nri3084 - DOI - PubMed
    1. Gourraud PA, Khankhanian P, Cereb N, Yang SY, Feolo M, Maiers M, et al. . HLA diversity in the 1000 genomes dataset. PLoS One (2014) 9:e97282. doi: 10.1371/journal.pone.0097282 - DOI - PMC - PubMed
    1. Andreatta M, Nielsen M. Gapped sequence alignment using artificial neural networks: application to the MHC class I system. Bioinformatics (2016) 32:511–7. doi: 10.1093/bioinformatics/btv639 - DOI - PMC - PubMed
    1. Reynisson B, Alvarez B, Paul S, Peters B, Nielsen M. NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Res (2020) 48:W449–54. doi: 10.1093/nar/gkaa379 - DOI - PMC - PubMed
    1. McGranahan N, Rosenthal R, Hiley CT, Rowan AJ, Watkins TB, Wilson GA, et al. . Allele-specific HLA loss and immune escape in lung cancer evolution. Cell (2017) 171:1259–71. doi: 10.1016/j.cell.2017.10.001 - DOI - PMC - PubMed