. 2018 Nov 16;10(1):84.

doi: 10.1186/s13073-018-0594-6.

Footprints of antigen processing boost MHC class II natural ligand predictions

Carolina Barra¹, Bruno Alvarez¹, Sinu Paul², Alessandro Sette², Bjoern Peters², Massimo Andreatta¹, Søren Buus³, Morten Nielsen^{4

5}

Affiliations

¹ Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, CP1650, San Martín, Argentina.
² Division of Vaccine Discovery, La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, CA, 92037, USA.
³ Department of Immunology and Microbiology, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark.
⁴ Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, CP1650, San Martín, Argentina. mniel@bioinformatics.dtu.dk.
⁵ Department of Bio and Health Informatics, Technical University of Denmark, DK-2800, Kgs. Lyngby, Denmark. mniel@bioinformatics.dtu.dk.

PMID: 30446001
PMCID: PMC6240193
DOI: 10.1186/s13073-018-0594-6

Footprints of antigen processing boost MHC class II natural ligand predictions

Carolina Barra et al. Genome Med. 2018.

. 2018 Nov 16;10(1):84.

doi: 10.1186/s13073-018-0594-6.

Authors

Carolina Barra¹, Bruno Alvarez¹, Sinu Paul², Alessandro Sette², Bjoern Peters², Massimo Andreatta¹, Søren Buus³, Morten Nielsen^{4

5}

Affiliations

¹ Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, CP1650, San Martín, Argentina.
² Division of Vaccine Discovery, La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, CA, 92037, USA.
³ Department of Immunology and Microbiology, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark.
⁴ Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, CP1650, San Martín, Argentina. mniel@bioinformatics.dtu.dk.
⁵ Department of Bio and Health Informatics, Technical University of Denmark, DK-2800, Kgs. Lyngby, Denmark. mniel@bioinformatics.dtu.dk.

PMID: 30446001
PMCID: PMC6240193
DOI: 10.1186/s13073-018-0594-6

Abstract

Background: Major histocompatibility complex class II (MHC-II) molecules present peptide fragments to T cells for immune recognition. Current predictors for peptide to MHC-II binding are trained on binding affinity data, generated in vitro and therefore lacking information about antigen processing.

Methods: We generate prediction models of peptide to MHC-II binding trained with naturally eluted ligands derived from mass spectrometry in addition to peptide binding affinity data sets.

Results: We show that integrated prediction models incorporate identifiable rules of antigen processing. In fact, we observed detectable signals of protease cleavage at defined positions of the ligands. We also hypothesize a role of the length of the terminal ligand protrusions for trimming the peptide to the MHC presented ligand.

Conclusions: The results of integrating binding affinity and eluted ligand data in a combined model demonstrate improved performance for the prediction of MHC-II ligands and T cell epitopes and foreshadow a new generation of improved peptide to MHC-II prediction tools accounting for the plurality of factors that determine natural presentation of antigens.

Keywords: Antigen processing; Binding predictions; Eluted ligands; MHC-II; Machine learning; Mass spectrometry; Neural networks; T cell epitope.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

**Fig. 1**
GibbsCluster output for the five eluted ligand data sets employed in this work. For each set, the Kullback-Leibler distance (KLD) histogram (black bars) is displayed, which indicates the information content present in all clustering solutions (in this case, groups of one to three clusters) together with the motif logo(s) corresponding to the maximum KLD solution. The upper row gives the results for the DR15/51 data sets; the lower row for the DR1 data sets. Note that DR15 Ph was obtained from a cell line which expresses two HLA-DR molecules, HLA-DRB1*15:01 and HLA-DRB5*01:01 (DR15/51)

**Fig. 2**
Binding preferences learned by the single NNAlign [29] models trained on binding affinity (BA) or eluted ligand (EL) data. In the top row, motifs for the DRB1*01:01 allele are shown, with overlined logo plots (right) corresponding to models trained on EL data, and the non-overlined logo (left) corresponding to the BA trained model. Similarly, binding motifs for DRB1*15:01 and DRB5*01:01 are displayed in the middle and bottom row respectively, with overlined logos (right) also indicating the EL-trained model preferences, and the non-overlined logo plot (left) indicating the BA preference. Logos were constructed from the predicted binding cores in the top 1% scoring predictions of 900.000 random natural peptides for BA and from the top 0.1% scoring predictions for EL

**Fig. 3**
Peptide length preferences learned by the six models trained on binding affinity (BA) and eluted ligand (EL) combined data. For each model, green traces represent the length histogram of the top 1% scoring predictions for the BA output neuron, on a prediction data set composed of one million random peptides; red traces refer to the length histogram of the top 0.1% scoring predictions for the EL output neuron, on the same prediction set; black traces indicate the length distribution of the raw MS data

**Fig. 4**
Processing signals found at N and C terminus positions in the DR15 Pm data set (located at upstream and downstream regions, respectively), grouped by peptide flanking region (PFR) length. For the upstream part of the ligands (top row), the processing signal is always centered at the N terminal position, extending three positions beyond the cleavage site (upstream “context,” symbolized as blue bars) and one to six positions towards the binding core, depending on the PFR length (orange bars). For the downstream region (bottom row), the disposition of elements is mirrored: the proposed processing signal is centered at C terminus and extends three positions beyond the cleavage site (downstream “context” region, pink bars) and one to six positions towards the binding core (green bars), depending on the PFR length. Amino acid background frequencies were calculated using the antigenic source protein of all the ligands present in the data set. Motifs were generated using Seq2logo, as described in the “Methods” section

**Fig. 5**
Processing signals located at N and C terminal regions in the DR15 Pm data set. For each region, all ligands with PFR length lower than 3 were discarded. Then, the logos were constructed as described in the text by selecting the closest three PFR and context residues neighboring the N and C termini. For additional details on processing signal construction, refer to Fig. 4

**Fig. 6**
Correlation between processing signals found in the six different data sets employed in this work, for upstream and downstream regions. Each matrix entry displays the Pearson correlation coefficient (PCC) value of two data sets under study. A PCC value of one corresponds to a maximum correlation, while a PCC value of zero means no correlation. Processing signals used in this figure were generated as explained in Fig. 5. All observed PCC values are statistically different from random (P < 0.001, exact permutation test)

**Fig. 7**
Predictive performance on a panel of CD4+ T cell epitopes. The boxplots represent the distribution of AUC values over all epitope evaluation data sets restricted to a given allele comparing the different models. Middle lines in boxes correspond to median values. The height of the box represents 50% of the data. Whiskers represent 1.5 quartile range (QR) of data, and dots represent outliers of 1.5 of QR. P significance is calculated from Wilcoxon test. ^nsP > 0.05, *P ≤ 0.05, **P ≤ 0.01, ***P ≤ 0.001, ****P ≤ 0.0001. In both benchmarks, an AUC value was calculated for each epitope/source protein pair by considering peptides identical to the epitope as positives and all other peptides as negatives excluding peptides with an overlap of at least nine amino acids to the epitope. a Comparison of the combined models developed in this study with context information (EL + context) and without context (EL) to current state-of-the-art prediction methods trained on binding affinity data only (NetMHCII-2.3 and NetMHCIIpan-3.2). b Comparison of EL + context and EL in a benchmark where the epitope evaluation set was constructed using the evaluation strategy accounting for ligand preference described in the text

See this image and copyright information in PMC

Cited by

T Cell Epitope Predictions.
Peters B, Nielsen M, Sette A. Peters B, et al. Annu Rev Immunol. 2020 Apr 26;38:123-145. doi: 10.1146/annurev-immunol-082119-124838. Epub 2020 Feb 11. Annu Rev Immunol. 2020. PMID: 32045313 Free PMC article. Review.
Machine learning reveals limited contribution of trans-only encoded variants to the HLA-DQ immunopeptidome.
Nilsson JB, Kaabinejadian S, Yari H, Peters B, Barra C, Gragert L, Hildebrand W, Nielsen M. Nilsson JB, et al. Commun Biol. 2023 Apr 21;6(1):442. doi: 10.1038/s42003-023-04749-7. Commun Biol. 2023. PMID: 37085710 Free PMC article.
A Systematic, Unbiased Mapping of CD8⁺ and CD4⁺ T Cell Epitopes in Yellow Fever Vaccinees.
Stryhn A, Kongsgaard M, Rasmussen M, Harndahl MN, Østerbye T, Bassi MR, Thybo S, Gabriel M, Hansen MB, Nielsen M, Christensen JP, Randrup Thomsen A, Buus S. Stryhn A, et al. Front Immunol. 2020 Aug 31;11:1836. doi: 10.3389/fimmu.2020.01836. eCollection 2020. Front Immunol. 2020. PMID: 32983097 Free PMC article.
Cancer Neoantigens: Challenges and Future Directions for Prediction, Prioritization, and Validation.
Borden ES, Buetow KH, Wilson MA, Hastings KT. Borden ES, et al. Front Oncol. 2022 Mar 3;12:836821. doi: 10.3389/fonc.2022.836821. eCollection 2022. Front Oncol. 2022. PMID: 35311072 Free PMC article. Review.
Comprehensive analysis of T cell immunodominance and immunoprevalence of SARS-CoV-2 epitopes in COVID-19 cases.
Tarke A, Sidney J, Kidd CK, Dan JM, Ramirez SI, Yu ED, Mateus J, da Silva Antunes R, Moore E, Rubiro P, Methot N, Phillips E, Mallal S, Frazier A, Rawlings SA, Greenbaum JA, Peters B, Smith DM, Crotty S, Weiskopf D, Grifoni A, Sette A. Tarke A, et al. Cell Rep Med. 2021 Feb 16;2(2):100204. doi: 10.1016/j.xcrm.2021.100204. Epub 2021 Jan 26. Cell Rep Med. 2021. PMID: 33521695 Free PMC article.

See all "Cited by" articles

References

1. Rudolph MG, Stanfield RL, Wilson IA. How TCRs bind MHCs, peptides, and coreceptors. Annu Rev Immunol. 2006;24:419–466. doi: 10.1146/annurev.immunol.23.021704.115658. - DOI - PubMed
1. Kim A, Hartman IZ, Poore B, Boronina T, Cole RN, Song N, et al. Divergent paths for the selection of immunodominant epitopes from distinct antigenic sources. Nat Commun. 2014;5:5369. doi: 10.1038/ncomms6369. - DOI - PMC - PubMed
1. Sette A, Adorini L, Colon SM, Buus S, Grey HM. Capacity of intact proteins to bind to MHC class II molecules. J Immunol. 1989;143:1265–1267. - PubMed
1. Andreatta M, Jurtz VI, Kaever T, Sette A, Peters B, Nielsen M. Machine learning reveals a non-canonical mode of peptide binding to MHC class II molecules. Immunology. 2017;152:255–264. doi: 10.1111/imm.12763. - DOI - PMC - PubMed
1. Lovitch SB, Pu Z, Unanue ER. Amino-terminal flanking residues determine the conformation of a peptide-class II MHC complex. J Immunol. 2006;176:2958–2968. doi: 10.4049/jimmunol.176.5.2958. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions

Grants and funding

HHSN272201200010C/AI/NIAID NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Footprints of antigen processing boost MHC class II natural ligand predictions

Affiliations

Footprints of antigen processing boost MHC class II natural ligand predictions

Authors

Affiliations

Abstract

Conflict of interest statement

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials