Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;14 Suppl 8(Suppl 8):S1.
doi: 10.1186/1471-2105-14-S8-S1. Epub 2013 May 9.

Integrating peptides' sequence and energy of contact residues information improves prediction of peptide and HLA-I binding with unknown alleles

Affiliations

Integrating peptides' sequence and energy of contact residues information improves prediction of peptide and HLA-I binding with unknown alleles

Fei Luo et al. BMC Bioinformatics. 2013.

Abstract

Background: The HLA (human leukocyte antigen) class I is a kind of molecule encoded by a large family of genes and is characteristic of high polymorphism. Now the number of the registered HLA-I molecules has exceeded 3000. Slight differences in the amino acid sequences of HLAs would make them bind to different sets of peptides. In the past decades, although many methods have been proposed to predict the binding between peptides and HLA-I molecules and achieved good performance, most experimental data used by them is limited to the HLAs with a small number of alleles. Thus they are inclined to obtain high prediction accuracy only for data with similar alleles. Because the peptides and HLAs together determine the binding, it's necessary to consider their contribution meanwhile.

Results: By taking into account the features of the peptides sequence and the energy of contact residues, in this paper a method based on the artificial neural network is proposed to predict the binding of peptides and HLA-I even when the HLAs' potential alleles are unknown. Two experiments in the allele-specific and super-type cases are performed respectively to validate our method. In the first case, we collect 14 HLA-A and 14 HLA-B molecules on Bjoern Peters dataset, and compare our method with the ARB, SMM, NetMHC and other 16 online methods. Our method gets the best average AUC (Area under the ROC) value as 0.909. In the second one, we use leave one out cross validation on MHC-peptide binding data that has different alleles but shares the common super-type. Compared to gold standard methods like NetMHC and NetMHCpan, our method again achieves the best average AUC value as 0.847.

Conclusions: Our method achieves satisfactory results. Whenever it's tested on the HLA-I with single definite gene or with super-type gene locus, it gets better classification accuracy. Especially, when the training set is small, our method still works better than the other methods in the comparison. Therefore, we could make a conclusion that by combining the peptides' information, HLAs amino acid residues' interaction information and contact energy, our method really could improve prediction of the peptide HLA-I binding even when there aren't the prior experimental dataset for HLAs with various alleles.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The framework of our method. The input data contains two parts, one is the peptide and the other is HLA molecule. HLA molecules will be processed by the steps of extracting interacting amino acid residues and computing the contact energy. Then they will be encoded as the classification features and input into the established classifier to do the training and predict.
Figure 2
Figure 2
Interacting residues. (a) is the binding sites of HLA-A and (b) is the binding sites of HLA-B. The column number represents the HLA molecular residue index given by the IMGT/HLA database and the row number indicates the amino acid residue index of peptide with the length 9. The grey cells in the grid indicate residues that have interaction between HLA and peptide.
Figure 3
Figure 3
Methods comparison. According to the results in the table 1, we divide results into HLA-A class group (a) and HLA-B class group (b) and order them in ascendance based on the peptide number to measure the correlation between scale of dataset and classification accuracy. The panels from left to right and up to down are the linear fitting between the peptide number (x axis) and accuracy (y axis) on five methods: ANNBM, ARB, SMM, NetMHC, and Other methods. The right down picture is the standard deviation of the classification accuracy. We could see ANNBM gets the smallest slope rate and standard deviation, which proves that ANNBM is most independent with dataset scale and stable.
Figure 4
Figure 4
ROC curve of ANNBM、 ARB、 SMM、 NetMHC on HLA-A*0201.
Figure 5
Figure 5
ROC curve of ANNBM、 ARB、 SMM、 NetMHC on HLA-B*4402.
Figure 6
Figure 6
ROC curve of ANNBM, NetMHC and NetMHCpan on A*0202.
Figure 7
Figure 7
ROC curve of ANNBM, NetMHC and NetMHCpan on B*3501.

Similar articles

References

    1. Rudensky A, Preston-Hurlburt P, al-Ramadi BK, Rothbard J, Janeway CA Jr. Truncation variants of peptides isolated from MHC class II molecules suggest sequence motifs. Nature. 1992;359(6394):429–431. doi: 10.1038/359429a0. - DOI - PubMed
    1. Cole GA, Tao T, Hogg TL, Ryan KW, Woodland DL. Binding motifs predict major histocompatibility complex class II-restricted epitopes in the Sendai virus M protein. J Virol. 1995;69(12):8057–8060. - PMC - PubMed
    1. Rammensee H, Bachmann J, Emmerich NP, Bachor OA, Stevanovic S. SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics. 1999;50(3):213–219. doi: 10.1007/s002510050595. - DOI - PubMed
    1. Doytchinova IA, Blythe MJ, Flower DR. Additive method for the prediction of protein-peptide binding affinity. Application to the MHC class I molecule HLA-A*0201. J Proteome Res. 2002;1(3):263–272. doi: 10.1021/pr015513z. - DOI - PubMed
    1. Brusic V, Rudy G, Harrison LC. MHCPEP, a database of MHC-binding peptides: update 1997. Nucleic Acids Res. 1998;26(1):368–371. doi: 10.1093/nar/26.1.368. - DOI - PMC - PubMed

MeSH terms