Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 27;25(3):bbae154.
doi: 10.1093/bib/bbae154.

TripHLApan: predicting HLA molecules binding peptides based on triple coding matrix and transfer learning

Affiliations

TripHLApan: predicting HLA molecules binding peptides based on triple coding matrix and transfer learning

Meng Wang et al. Brief Bioinform. .

Abstract

Human leukocyte antigen (HLA) recognizes foreign threats and triggers immune responses by presenting peptides to T cells. Computationally modeling the binding patterns between peptide and HLA is very important for the development of tumor vaccines. However, it is still a big challenge to accurately predict HLA molecules binding peptides. In this paper, we develop a new model TripHLApan for predicting HLA molecules binding peptides by integrating triple coding matrix, BiGRU + Attention models, and transfer learning strategy. We have found the main interaction site regions between HLA molecules and peptides, as well as the correlation between HLA encoding and binding motifs. Based on the discovery, we make the preprocessing and coding closer to the natural biological process. Besides, due to the input being based on multiple types of features and the attention module focused on the BiGRU hidden layer, TripHLApan has learned more sequence level binding information. The application of transfer learning strategies ensures the accuracy of prediction results under special lengths (peptides in length 8) and model scalability with the data explosion. Compared with the current optimal models, TripHLApan exhibits strong predictive performance in various prediction environments with different positive and negative sample ratios. In addition, we validate the superiority and scalability of TripHLApan's predictive performance using additional latest data sets, ablation experiments and binding reconstitution ability in the samples of a melanoma patient. The results show that TripHLApan is a powerful tool for predicting the binding of HLA-I and HLA-II molecular peptides for the synthesis of tumor vaccines. TripHLApan is publicly available at https://github.com/CSUBioGroup/TripHLApan.git.

Keywords: HLA; pan-specific prediction model; peptide; tumor vaccine.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The workflow and architecture of TripHLApan. A. Network architecture of TripHLApan for HLA-peptide prediction. TripHLApan firstly preprocesses HLA and peptide sequences with three coding matrices, then puts the coded matrices into BiGRU + Attention modules. After a fully connected layer at the end of each channel, the outputs of the three matrices are concatenated. Finally, after three fully connected layers, TripHLApan outputs the predicted binding probability from the sigmoid layer. B. Transfer learning patterns on HLA-I-peptide prediction task. A model is trained on binding peptide lengths of 9–14 and is then transferred to the model with peptide length of 8. C. Different binding forms of HLA-I and II molecules to peptides. 1AKJ: a complex of HLA-A molecules and peptides. 1JK8: a complex of HLA-DQA/DQB molecules and peptides. 3LQZ: a complex of HLA-DP1A/DP1B molecules and peptides.
Figure 2
Figure 2
Identification of shared motifs among HLA-A, B and C alleles. Subfigures A, B and C illustrate the pairwise correlations between 45 allele sequences obtained by three allele sequence extraction methods and their corresponding relationships with motifs, respectively. I to XI represent different motif groups.
Figure 3
Figure 3
TripHLApan is compared with the prediction results of baseline tools. A. The number of overlapping HLA types of the unseen set used in this paper and the training sets of nine tools including TripHLApan. B/C/D. AUCs/AUPRs and top-PPVs (the fraction of positive peptides within the top N) on the data sets with peptides of different lengths on the rate of positive and negative samples at 1:5/1:1/1:10/1:50.
Figure 4
Figure 4
TripHLApan is compared with the prediction results of baseline tools on the data set in which the alleles do not appear in all tool’s training sets and on the lasted data set. A. Allele types in the unseen sets. B–D. ROC/AUPR curves and top-PPVs on all unseen sets. E–G. ROC/AUPR curves and top-PPVs on test set of TripHLApan compared with the prediction results of baseline tools. H. AUCs on peptides of different lengths on test set of TripHLApan compared with the prediction results of baseline tools.
Figure 5
Figure 5
Ablation experiment. ROC curves and AUCs on peptides of different lengths on the test set of TripHLApan and its three different model strategies, as well as the three baseline models that performed better.
Figure 6
Figure 6
Comparison of the average PCCs of tools on the four monoallelic samples.
Figure 7
Figure 7
AUC distribution of TripHLApan compared with baseline methods on 35 Allele sets on HLA-II-peptide binding prediction.

References

    1. Lefranc M-P, Giudicelli V, Duroux P, et al. . IMGT®, the international ImMunoGeneTics information system® 25 years on. Nucl Acids 2015;43(D1):D413–22. - PMC - PubMed
    1. Robinson J, Halliwell JA, Hayhurst JD, et al. . The IPD and IMGT/HLA database: allele variant databases. Nucleic Acids Res 2015;43(D1):D423–31. - PMC - PubMed
    1. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA Cancer J Clin 2019;69(1):7–34. - PubMed
    1. Finck A, Gill SI, June CH. Cancer immunotherapy comes of age and looks for maturity. Nat Commun 2020;11(1):1–4. - PMC - PubMed
    1. Gubin MM, Schreiber RD. The odds of immunotherapy success. Science 2015;350(6257):158–9. - PubMed