. 2016 Feb 15;32(4):511-7.

doi: 10.1093/bioinformatics/btv639. Epub 2015 Oct 29.

Gapped sequence alignment using artificial neural networks: application to the MHC class I system

Massimo Andreatta¹, Morten Nielsen²

Affiliations

¹ Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, Buenos Aires, Argentina and.
² Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, Buenos Aires, Argentina and Center for Biological Sequence Analysis, Technical University of Denmark, Kgs. Lyngby, Denmark.

PMID: 26515819
PMCID: PMC6402319
DOI: 10.1093/bioinformatics/btv639

Gapped sequence alignment using artificial neural networks: application to the MHC class I system

Massimo Andreatta et al. Bioinformatics. 2016.

. 2016 Feb 15;32(4):511-7.

doi: 10.1093/bioinformatics/btv639. Epub 2015 Oct 29.

Authors

Massimo Andreatta¹, Morten Nielsen²

Affiliations

¹ Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, Buenos Aires, Argentina and.
² Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, Buenos Aires, Argentina and Center for Biological Sequence Analysis, Technical University of Denmark, Kgs. Lyngby, Denmark.

PMID: 26515819
PMCID: PMC6402319
DOI: 10.1093/bioinformatics/btv639

Abstract

Motivation: Many biological processes are guided by receptor interactions with linear ligands of variable length. One such receptor is the MHC class I molecule. The length preferences vary depending on the MHC allele, but are generally limited to peptides of length 8-11 amino acids. On this relatively simple system, we developed a sequence alignment method based on artificial neural networks that allows insertions and deletions in the alignment.

Results: We show that prediction methods based on alignments that include insertions and deletions have significantly higher performance than methods trained on peptides of single lengths. Also, we illustrate how the location of deletions can aid the interpretation of the modes of binding of the peptide-MHC, as in the case of long peptides bulging out of the MHC groove or protruding at either terminus. Finally, we demonstrate that the method can learn the length profile of different MHC molecules, and quantified the reduction of the experimental effort required to identify potential epitopes using our prediction algorithm.

Availability and implementation: The NetMHC-4.0 method for the prediction of peptide-MHC class I binding affinity using gapped sequence alignment is publicly available at: http://www.cbs.dtu.dk/services/NetMHC-4.0.

PubMed Disclaimer

Figures

**Fig. 1.**
Examples of insertion and deletion applied to sequences of length different from nine. (a) Insertion: the wildcard amino acid X (encoded as a vector of zeros) is inserted in each possible position to complete the peptide to a 9mer core. The sequence with the insertion that returns the highest predicted score (in this case an insertion at P4) is taken as the optimal binding core. (b) Deletion: long peptides are reconciled to a 9mer amino acid core either by an extension at the terminals (first and last peptides in the example), or by deleting amino acids within the sequence. In this example a single deletion at P6 in the 10mer was found to be optimal

**Fig. 2.**
Difference in PCC (Pearson Correlation Coefficient) between networks trained on data for all peptide lengths (allmer) and networks trained on single lengths (nmer). Points above the baseline indicate alleles for which networks trained on all lengths give higher performance, and the deviation from the baseline shows the extent of the difference in terms of PCC. For 8mer peptides, allmer networks have higher performance in 32/38 alleles (p = 1 × 10⁻⁵), for 9mers in 85/118 alleles (p = 9 × 10⁻⁷), for 10mers in 60/63 alleles (p = 4 × 10⁻¹⁵), for 11mers in 36/37 alleles (p = 3 × 10⁻¹⁰)

**Fig. 3.**
Difference in PCC between networks trained on data for all peptide lengths (allmer) and networks trained only on 9-mers with the L-mer approximation (Lmer). Points above the baseline indicate alleles for which networks trained on all lengths give higher performance, and the deviation from the baseline shows the extent of the difference in terms of PCC. For 8mer peptides, allmer networks have higher performance in 28/38 alleles (p = 0.003), for 10mers in 57/63 alleles (p = 8 × 10⁻¹²), for 11mers in 24/37 alleles (p = 0.05)

**Fig. 4.**
Peptide length distributions predicted by NetMHC-4.0. (a) Length distribution for networks trained on peptides of all lengths and for the L-mer approximation networks, compared with the length distribution of ligands in the SYFPEITHI database. The allmer and L-mer profiles were calculated by running 400 000 random natural peptides through the predictors and calculating the relative number of peptides of different lengths among the top 1% predicted binders. (b) Predicted length distributions for selected alleles. For H-2-Kb the networks learn a preference for 8mers and 9mers, HLA-A*02:01 has a slight preference of 9mers over 10mers, HLA-B*07:02 favors 10mers and to a lesser extend 9mers, HLA-B*35:01 and HLA-C*04:01 have a strong preference for 9mer peptides

**Fig. 5.**
3D structures for two MHC class I molecules with bound peptides longer than 9 amino acids (PDB references 2CLR and 4JQX). (a) The 10mer peptide MLLSVPLLLG bound to HLA-A*02:01 extends at the C terminus with a glycine (G) amino acid. The residues at the anchor positions P2 (L) and P9 (L) are highlighted. (b) The 12mer EECDSELEIKRY bound to HLA-B*44:03 has anchors at its second (E) and last (Y) positions and bulges out from the middle of the MHC binding groove

**Fig. 6.**
Number of peptides per protein that should be tested to identify known ligands in the SYFPEITHI dataset. Antigenic proteins were digested into all possible peptides of length 8–11 as described in the text, which were then ranked by NetMHC-4.0 predicted affinity. The plot depicts the maximum number of peptides that would have to be tested for each protein before detecting the known ligand in the ranked list. The inset graph is a zoomed-out version of the curves of the main graph, showing eventual convergence to 100% identified ligands

See this image and copyright information in PMC

References

1. Andreatta M., et al. (2011) NNAlign: a web-based prediction method allowing non-expert end-user discovery of sequence motifs in quantitative peptide data. PLoS One, 6, e26781. - PMC - PubMed
1. Bassani-Sternberg M., et al. (2015) Mass spectrometry of human leukocyte antigen class I peptidomes reveals strong effects of protein abundance and turnover on antigen presentation. Mol. Cell. Proteomics MCP, 14, 658–673. - PMC - PubMed
1. Burrows S.R., et al. (2006) Have we cut ourselves too short in mapping CTL epitopes?. Trends Immunol., 27, 11–16. - PubMed
1. Collins E.J., et al. (1994) Three-dimensional structure of a peptide extending from one end of a class I MHC binding site. Nature, 371, 626–629. - PubMed
1. Deres K., et al. (1992) Preferred size of peptides that bind to H-2 Kb is sequence dependent. Eur. J. Immunol., 22, 1603–1608. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

HHSN272201200010C/PHS HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Gapped sequence alignment using artificial neural networks: application to the MHC class I system

Affiliations

Gapped sequence alignment using artificial neural networks: application to the MHC class I system

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials