Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Feb 15;32(4):511-7.
doi: 10.1093/bioinformatics/btv639. Epub 2015 Oct 29.

Gapped sequence alignment using artificial neural networks: application to the MHC class I system

Affiliations

Gapped sequence alignment using artificial neural networks: application to the MHC class I system

Massimo Andreatta et al. Bioinformatics. .

Abstract

Motivation: Many biological processes are guided by receptor interactions with linear ligands of variable length. One such receptor is the MHC class I molecule. The length preferences vary depending on the MHC allele, but are generally limited to peptides of length 8-11 amino acids. On this relatively simple system, we developed a sequence alignment method based on artificial neural networks that allows insertions and deletions in the alignment.

Results: We show that prediction methods based on alignments that include insertions and deletions have significantly higher performance than methods trained on peptides of single lengths. Also, we illustrate how the location of deletions can aid the interpretation of the modes of binding of the peptide-MHC, as in the case of long peptides bulging out of the MHC groove or protruding at either terminus. Finally, we demonstrate that the method can learn the length profile of different MHC molecules, and quantified the reduction of the experimental effort required to identify potential epitopes using our prediction algorithm.

Availability and implementation: The NetMHC-4.0 method for the prediction of peptide-MHC class I binding affinity using gapped sequence alignment is publicly available at: http://www.cbs.dtu.dk/services/NetMHC-4.0.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Examples of insertion and deletion applied to sequences of length different from nine. (a) Insertion: the wildcard amino acid X (encoded as a vector of zeros) is inserted in each possible position to complete the peptide to a 9mer core. The sequence with the insertion that returns the highest predicted score (in this case an insertion at P4) is taken as the optimal binding core. (b) Deletion: long peptides are reconciled to a 9mer amino acid core either by an extension at the terminals (first and last peptides in the example), or by deleting amino acids within the sequence. In this example a single deletion at P6 in the 10mer was found to be optimal
Fig. 2.
Fig. 2.
Difference in PCC (Pearson Correlation Coefficient) between networks trained on data for all peptide lengths (allmer) and networks trained on single lengths (nmer). Points above the baseline indicate alleles for which networks trained on all lengths give higher performance, and the deviation from the baseline shows the extent of the difference in terms of PCC. For 8mer peptides, allmer networks have higher performance in 32/38 alleles (p = 1 × 10−5), for 9mers in 85/118 alleles (p = 9 × 10−7), for 10mers in 60/63 alleles (p = 4 × 10−15), for 11mers in 36/37 alleles (p = 3 × 10−10)
Fig. 3.
Fig. 3.
Difference in PCC between networks trained on data for all peptide lengths (allmer) and networks trained only on 9-mers with the L-mer approximation (Lmer). Points above the baseline indicate alleles for which networks trained on all lengths give higher performance, and the deviation from the baseline shows the extent of the difference in terms of PCC. For 8mer peptides, allmer networks have higher performance in 28/38 alleles (p = 0.003), for 10mers in 57/63 alleles (p = 8 × 10−12), for 11mers in 24/37 alleles (p = 0.05)
Fig. 4.
Fig. 4.
Peptide length distributions predicted by NetMHC-4.0. (a) Length distribution for networks trained on peptides of all lengths and for the L-mer approximation networks, compared with the length distribution of ligands in the SYFPEITHI database. The allmer and L-mer profiles were calculated by running 400 000 random natural peptides through the predictors and calculating the relative number of peptides of different lengths among the top 1% predicted binders. (b) Predicted length distributions for selected alleles. For H-2-Kb the networks learn a preference for 8mers and 9mers, HLA-A*02:01 has a slight preference of 9mers over 10mers, HLA-B*07:02 favors 10mers and to a lesser extend 9mers, HLA-B*35:01 and HLA-C*04:01 have a strong preference for 9mer peptides
Fig. 5.
Fig. 5.
3D structures for two MHC class I molecules with bound peptides longer than 9 amino acids (PDB references 2CLR and 4JQX). (a) The 10mer peptide MLLSVPLLLG bound to HLA-A*02:01 extends at the C terminus with a glycine (G) amino acid. The residues at the anchor positions P2 (L) and P9 (L) are highlighted. (b) The 12mer EECDSELEIKRY bound to HLA-B*44:03 has anchors at its second (E) and last (Y) positions and bulges out from the middle of the MHC binding groove
Fig. 6.
Fig. 6.
Number of peptides per protein that should be tested to identify known ligands in the SYFPEITHI dataset. Antigenic proteins were digested into all possible peptides of length 8–11 as described in the text, which were then ranked by NetMHC-4.0 predicted affinity. The plot depicts the maximum number of peptides that would have to be tested for each protein before detecting the known ligand in the ranked list. The inset graph is a zoomed-out version of the curves of the main graph, showing eventual convergence to 100% identified ligands

References

    1. Andreatta M., et al. (2011) NNAlign: a web-based prediction method allowing non-expert end-user discovery of sequence motifs in quantitative peptide data. PLoS One, 6, e26781. - PMC - PubMed
    1. Bassani-Sternberg M., et al. (2015) Mass spectrometry of human leukocyte antigen class I peptidomes reveals strong effects of protein abundance and turnover on antigen presentation. Mol. Cell. Proteomics MCP, 14, 658–673. - PMC - PubMed
    1. Burrows S.R., et al. (2006) Have we cut ourselves too short in mapping CTL epitopes?. Trends Immunol., 27, 11–16. - PubMed
    1. Collins E.J., et al. (1994) Three-dimensional structure of a peptide extending from one end of a class I MHC binding site. Nature, 371, 626–629. - PubMed
    1. Deres K., et al. (1992) Preferred size of peptides that bind to H-2 Kb is sequence dependent. Eur. J. Immunol., 22, 1603–1608. - PubMed

Publication types

MeSH terms