Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec;31(12):e4497.
doi: 10.1002/pro.4497.

BepiPred-3.0: Improved B-cell epitope prediction using protein language models

Affiliations

BepiPred-3.0: Improved B-cell epitope prediction using protein language models

Joakim Nøddeskov Clifford et al. Protein Sci. 2022 Dec.

Abstract

B-cell epitope prediction tools are of great medical and commercial interest due to their practical applications in vaccine development and disease diagnostics. The introduction of protein language models (LMs), trained on unprecedented large datasets of protein sequences and structures, tap into a powerful numeric representation that can be exploited to accurately predict local and global protein structural features from amino acid sequences only. In this paper, we present BepiPred-3.0, a sequence-based epitope prediction tool that, by exploiting LM embeddings, greatly improves the prediction accuracy for both linear and conformational epitope prediction on several independent test sets. Furthermore, by carefully selecting additional input variables and epitope residue annotation strategy, performance was further improved, thus achieving unprecedented predictive power. Our tool can predict epitopes across hundreds of sequences in minutes. It is freely available as a web server and a standalone package at https://services.healthtech.dtu.dk/service.php?BepiPred-3.0 with a user-friendly interface to navigate the results.

Keywords: B-cell epitope prediction; B-cell epitopes; BepiPred; BepiPred-3.0; bioinformatics; deep learning; immunoinformatics; immunology; machine learning; protein language model.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Overview of sequence encoding pipelines, where N is length of the sequence. Amino acid sequences were encoded using either sparse, BLOSUM62 or ESM‐2 derived sequence representation schemes. For the two former approaches, encodings from adjacent residues were concatenated to generate a new set of encodings describing the sequence context of each residue. The encoded sequences were subsequently used for training various models for position‐wise antigen prediction
FIGURE 2
FIGURE 2
ROC‐AUC curves for the BP3C50ID FFNN, illustrate the difference of using sparse (a), BLOSUM62 (b), or ESM‐2 encodings (c). The x and y axis are the false and true positive rates, respectively. Dashed lines along the diagonal indicate random performance at 50% AUC, and the remaining lines are the performances of different fold models. A confusion matrix illustrates the threshold‐dependent performance of the FFNN (ESM‐2) ensemble (d). The true negatives or positives and predicted negatives or positives are on the vertical and horizontal axis, respectively.
FIGURE 3
FIGURE 3
The graphical user interface for BepiPred‐3.0 on the external test set protein 7lj4_B. In this interface, the x and y axis are protein sequence positions and BepiPred‐3.0 epitope scores. Residues with a higher score are more likely to be part of a B‐cell epitope. The threshold can be set by using the slider bar, which moves a dashed line along the y‐axis. Epitope predictions are updated accordingly, and B‐cell epitope predictions at the set threshold can be downloaded by clicking the button “Download epitope prediction”

References

    1. Andersen P, Nielsen M, Lund O. Prediction of residues in discontinuous b‐cell epitopes using protein 3d structures. Protein Sci. 2006;15(11):2558–2567. - PMC - PubMed
    1. Bateman A, Martin MJ, Orchard S, et al. Uniprot: The universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49(1):D480–D489. - PMC - PubMed
    1. Behmard E, Soleymani B, Najafi A, Barzegari E. Immunoinformatic design of a covid‐19 subunit vaccine using entire structural immunogenic epitopes of sars‐cov‐2. Sci Rep. 2020;10(1):20864. - PMC - PubMed
    1. Berman HM, Westbrook J, Feng Z, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28(1):235–242. - PMC - PubMed
    1. Brandes N, Ofer D, Peleg Y, Rappoport N, Linial M. Proteinbert: A universal deep‐learning model of protein sequence and function. Bioinformatics. 2022;38(8):2102–2110. - PMC - PubMed

Substances

LinkOut - more resources