Integrating peptides' sequence and energy of contact residues information improves prediction of peptide and HLA-I binding with unknown alleles

Fei Luo¹, Yangyang Gao, Yongqiong Zhu, Juan Liu

Affiliations

PMID: 23815611
PMCID: PMC3654895
DOI: 10.1186/1471-2105-14-S8-S1

Integrating peptides' sequence and energy of contact residues information improves prediction of peptide and HLA-I binding with unknown alleles

Fei Luo et al. BMC Bioinformatics. 2013.

. 2013;14 Suppl 8(Suppl 8):S1.

doi: 10.1186/1471-2105-14-S8-S1. Epub 2013 May 9.

Authors

Fei Luo¹, Yangyang Gao, Yongqiong Zhu, Juan Liu

Affiliation

¹ School of Computer, Wuhan University, Wuhan, Hubei, China.

PMID: 23815611
PMCID: PMC3654895
DOI: 10.1186/1471-2105-14-S8-S1

Abstract

Background: The HLA (human leukocyte antigen) class I is a kind of molecule encoded by a large family of genes and is characteristic of high polymorphism. Now the number of the registered HLA-I molecules has exceeded 3000. Slight differences in the amino acid sequences of HLAs would make them bind to different sets of peptides. In the past decades, although many methods have been proposed to predict the binding between peptides and HLA-I molecules and achieved good performance, most experimental data used by them is limited to the HLAs with a small number of alleles. Thus they are inclined to obtain high prediction accuracy only for data with similar alleles. Because the peptides and HLAs together determine the binding, it's necessary to consider their contribution meanwhile.

Results: By taking into account the features of the peptides sequence and the energy of contact residues, in this paper a method based on the artificial neural network is proposed to predict the binding of peptides and HLA-I even when the HLAs' potential alleles are unknown. Two experiments in the allele-specific and super-type cases are performed respectively to validate our method. In the first case, we collect 14 HLA-A and 14 HLA-B molecules on Bjoern Peters dataset, and compare our method with the ARB, SMM, NetMHC and other 16 online methods. Our method gets the best average AUC (Area under the ROC) value as 0.909. In the second one, we use leave one out cross validation on MHC-peptide binding data that has different alleles but shares the common super-type. Compared to gold standard methods like NetMHC and NetMHCpan, our method again achieves the best average AUC value as 0.847.

Conclusions: Our method achieves satisfactory results. Whenever it's tested on the HLA-I with single definite gene or with super-type gene locus, it gets better classification accuracy. Especially, when the training set is small, our method still works better than the other methods in the comparison. Therefore, we could make a conclusion that by combining the peptides' information, HLAs amino acid residues' interaction information and contact energy, our method really could improve prediction of the peptide HLA-I binding even when there aren't the prior experimental dataset for HLAs with various alleles.

PubMed Disclaimer

Figures

**Figure 1**
**The framework of our method**. The input data contains two parts, one is the peptide and the other is HLA molecule. HLA molecules will be processed by the steps of extracting interacting amino acid residues and computing the contact energy. Then they will be encoded as the classification features and input into the established classifier to do the training and predict.

**Figure 2**
**Interacting residues**. (a) is the binding sites of HLA-A and (b) is the binding sites of HLA-B. The column number represents the HLA molecular residue index given by the IMGT/HLA database and the row number indicates the amino acid residue index of peptide with the length 9. The grey cells in the grid indicate residues that have interaction between HLA and peptide.

**Figure 3**
**Methods comparison**. According to the results in the table 1, we divide results into HLA-A class group (a) and HLA-B class group (b) and order them in ascendance based on the peptide number to measure the correlation between scale of dataset and classification accuracy. The panels from left to right and up to down are the linear fitting between the peptide number (x axis) and accuracy (y axis) on five methods: ANNBM, ARB, SMM, NetMHC, and Other methods. The right down picture is the standard deviation of the classification accuracy. We could see ANNBM gets the smallest slope rate and standard deviation, which proves that ANNBM is most independent with dataset scale and stable.

**Figure 4**
**ROC curve of ANNBM、 ARB、 SMM、 NetMHC on HLA-A*0201**.

**Figure 5**
**ROC curve of ANNBM、 ARB、 SMM、 NetMHC on HLA-B*4402**.

**Figure 6**
**ROC curve of ANNBM, NetMHC and NetMHCpan on A*0202**.

**Figure 7**
**ROC curve of ANNBM, NetMHC and NetMHCpan on B*3501**.

See this image and copyright information in PMC

References

1. Rudensky A, Preston-Hurlburt P, al-Ramadi BK, Rothbard J, Janeway CA Jr. Truncation variants of peptides isolated from MHC class II molecules suggest sequence motifs. Nature. 1992;359(6394):429–431. doi: 10.1038/359429a0. - DOI - PubMed
1. Cole GA, Tao T, Hogg TL, Ryan KW, Woodland DL. Binding motifs predict major histocompatibility complex class II-restricted epitopes in the Sendai virus M protein. J Virol. 1995;69(12):8057–8060. - PMC - PubMed
1. Rammensee H, Bachmann J, Emmerich NP, Bachor OA, Stevanovic S. SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics. 1999;50(3):213–219. doi: 10.1007/s002510050595. - DOI - PubMed
1. Doytchinova IA, Blythe MJ, Flower DR. Additive method for the prediction of protein-peptide binding affinity. Application to the MHC class I molecule HLA-A*0201. J Proteome Res. 2002;1(3):263–272. doi: 10.1021/pr015513z. - DOI - PubMed
1. Brusic V, Rudy G, Harrison LC. MHCPEP, a database of MHC-binding peptides: update 1997. Nucleic Acids Res. 1998;26(1):368–371. doi: 10.1093/nar/26.1.368. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Integrating peptides' sequence and energy of contact residues information improves prediction of peptide and HLA-I binding with unknown alleles

Affiliation

Integrating peptides' sequence and energy of contact residues information improves prediction of peptide and HLA-I binding with unknown alleles

Authors

Affiliation

Abstract

Figures

Similar articles

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous