Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov 16;10(1):84.
doi: 10.1186/s13073-018-0594-6.

Footprints of antigen processing boost MHC class II natural ligand predictions

Affiliations

Footprints of antigen processing boost MHC class II natural ligand predictions

Carolina Barra et al. Genome Med. .

Abstract

Background: Major histocompatibility complex class II (MHC-II) molecules present peptide fragments to T cells for immune recognition. Current predictors for peptide to MHC-II binding are trained on binding affinity data, generated in vitro and therefore lacking information about antigen processing.

Methods: We generate prediction models of peptide to MHC-II binding trained with naturally eluted ligands derived from mass spectrometry in addition to peptide binding affinity data sets.

Results: We show that integrated prediction models incorporate identifiable rules of antigen processing. In fact, we observed detectable signals of protease cleavage at defined positions of the ligands. We also hypothesize a role of the length of the terminal ligand protrusions for trimming the peptide to the MHC presented ligand.

Conclusions: The results of integrating binding affinity and eluted ligand data in a combined model demonstrate improved performance for the prediction of MHC-II ligands and T cell epitopes and foreshadow a new generation of improved peptide to MHC-II prediction tools accounting for the plurality of factors that determine natural presentation of antigens.

Keywords: Antigen processing; Binding predictions; Eluted ligands; MHC-II; Machine learning; Mass spectrometry; Neural networks; T cell epitope.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
GibbsCluster output for the five eluted ligand data sets employed in this work. For each set, the Kullback-Leibler distance (KLD) histogram (black bars) is displayed, which indicates the information content present in all clustering solutions (in this case, groups of one to three clusters) together with the motif logo(s) corresponding to the maximum KLD solution. The upper row gives the results for the DR15/51 data sets; the lower row for the DR1 data sets. Note that DR15 Ph was obtained from a cell line which expresses two HLA-DR molecules, HLA-DRB1*15:01 and HLA-DRB5*01:01 (DR15/51)
Fig. 2
Fig. 2
Binding preferences learned by the single NNAlign [29] models trained on binding affinity (BA) or eluted ligand (EL) data. In the top row, motifs for the DRB1*01:01 allele are shown, with overlined logo plots (right) corresponding to models trained on EL data, and the non-overlined logo (left) corresponding to the BA trained model. Similarly, binding motifs for DRB1*15:01 and DRB5*01:01 are displayed in the middle and bottom row respectively, with overlined logos (right) also indicating the EL-trained model preferences, and the non-overlined logo plot (left) indicating the BA preference. Logos were constructed from the predicted binding cores in the top 1% scoring predictions of 900.000 random natural peptides for BA and from the top 0.1% scoring predictions for EL
Fig. 3
Fig. 3
Peptide length preferences learned by the six models trained on binding affinity (BA) and eluted ligand (EL) combined data. For each model, green traces represent the length histogram of the top 1% scoring predictions for the BA output neuron, on a prediction data set composed of one million random peptides; red traces refer to the length histogram of the top 0.1% scoring predictions for the EL output neuron, on the same prediction set; black traces indicate the length distribution of the raw MS data
Fig. 4
Fig. 4
Processing signals found at N and C terminus positions in the DR15 Pm data set (located at upstream and downstream regions, respectively), grouped by peptide flanking region (PFR) length. For the upstream part of the ligands (top row), the processing signal is always centered at the N terminal position, extending three positions beyond the cleavage site (upstream “context,” symbolized as blue bars) and one to six positions towards the binding core, depending on the PFR length (orange bars). For the downstream region (bottom row), the disposition of elements is mirrored: the proposed processing signal is centered at C terminus and extends three positions beyond the cleavage site (downstream “context” region, pink bars) and one to six positions towards the binding core (green bars), depending on the PFR length. Amino acid background frequencies were calculated using the antigenic source protein of all the ligands present in the data set. Motifs were generated using Seq2logo, as described in the “Methods” section
Fig. 5
Fig. 5
Processing signals located at N and C terminal regions in the DR15 Pm data set. For each region, all ligands with PFR length lower than 3 were discarded. Then, the logos were constructed as described in the text by selecting the closest three PFR and context residues neighboring the N and C termini. For additional details on processing signal construction, refer to Fig. 4
Fig. 6
Fig. 6
Correlation between processing signals found in the six different data sets employed in this work, for upstream and downstream regions. Each matrix entry displays the Pearson correlation coefficient (PCC) value of two data sets under study. A PCC value of one corresponds to a maximum correlation, while a PCC value of zero means no correlation. Processing signals used in this figure were generated as explained in Fig. 5. All observed PCC values are statistically different from random (P <  0.001, exact permutation test)
Fig. 7
Fig. 7
Predictive performance on a panel of CD4+ T cell epitopes. The boxplots represent the distribution of AUC values over all epitope evaluation data sets restricted to a given allele comparing the different models. Middle lines in boxes correspond to median values. The height of the box represents 50% of the data. Whiskers represent 1.5 quartile range (QR) of data, and dots represent outliers of 1.5 of QR. P significance is calculated from Wilcoxon test. nsP > 0.05, *P ≤ 0.05, **P ≤ 0.01, ***P ≤ 0.001, ****P ≤ 0.0001. In both benchmarks, an AUC value was calculated for each epitope/source protein pair by considering peptides identical to the epitope as positives and all other peptides as negatives excluding peptides with an overlap of at least nine amino acids to the epitope. a Comparison of the combined models developed in this study with context information (EL + context) and without context (EL) to current state-of-the-art prediction methods trained on binding affinity data only (NetMHCII-2.3 and NetMHCIIpan-3.2). b Comparison of EL + context and EL in a benchmark where the epitope evaluation set was constructed using the evaluation strategy accounting for ligand preference described in the text

Similar articles

Cited by

References

    1. Rudolph MG, Stanfield RL, Wilson IA. How TCRs bind MHCs, peptides, and coreceptors. Annu Rev Immunol. 2006;24:419–466. doi: 10.1146/annurev.immunol.23.021704.115658. - DOI - PubMed
    1. Kim A, Hartman IZ, Poore B, Boronina T, Cole RN, Song N, et al. Divergent paths for the selection of immunodominant epitopes from distinct antigenic sources. Nat Commun. 2014;5:5369. doi: 10.1038/ncomms6369. - DOI - PMC - PubMed
    1. Sette A, Adorini L, Colon SM, Buus S, Grey HM. Capacity of intact proteins to bind to MHC class II molecules. J Immunol. 1989;143:1265–1267. - PubMed
    1. Andreatta M, Jurtz VI, Kaever T, Sette A, Peters B, Nielsen M. Machine learning reveals a non-canonical mode of peptide binding to MHC class II molecules. Immunology. 2017;152:255–264. doi: 10.1111/imm.12763. - DOI - PMC - PubMed
    1. Lovitch SB, Pu Z, Unanue ER. Amino-terminal flanking residues determine the conformation of a peptide-class II MHC complex. J Immunol. 2006;176:2958–2968. doi: 10.4049/jimmunol.176.5.2958. - DOI - PubMed

Publication types