Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Dec;12(47):e08896.
doi: 10.1002/advs.202508896. Epub 2025 Oct 7.

B-EPIC: A Transformer-Based Language Model for Decoding B Cell Immunodominance Patterns

Affiliations

B-EPIC: A Transformer-Based Language Model for Decoding B Cell Immunodominance Patterns

Jun-Ze Liang et al. Adv Sci (Weinh). 2025 Dec.

Abstract

Vaccine development for pathogens has faced significant challenges, contributing to a public health burden. B-cell epitope (BCE) prediction is a crucial process in vaccine development, but is hindered by limited efficiency and accuracy. To address this, B-Epic, the first pipeline applying Transformer to predict BCEs is independently developed. B-Epic's robustness is validated through multiple testing datasets, including distinguishing clinically-approved vaccine targets, identifying BCEs (the Immune Epitope Database testing dataset; n = 23,888) and immunoreactive peptides (Trypanosoma cruzi peptidome; n = 239,575) with high AUCs of 0.882 and 0.945, respectively, outperforming widely used tools. Based on its superior performance, B-Epic is applied to the prevention of carcinogenic pathogens. In the application to Helicobacter pylori, peptides screened by B-Epic can activate B cells in experiments, suggesting their potential as vaccine targets. In another application to Epstein-Barr virus, B-Epic identifies pan-immunoreactive peptides in a clinical cohort (n = 899). These peptides exhibit higher reactogenicity in nasopharyngeal carcinoma patients than in healthy controls (n = 140), indicating their viability as immunodiagnostic targets. Overall, B-Epic utilizes self-attention, high-dimensional feature projection, and convolutional neural networks to autonomously extract complicated BCE features, enabling accurate BCE prediction and thereby facilitating efforts to prevent infectious diseases and cancers.

Keywords: B cell epitope prediction; Immunodiagnostics design; pathogens prevention; transformer; vaccines development.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Development of B‐Epic: a Transformer‐based Approach for BCE Prediction. A) The development, validation, and application of B‐Epic were shown in the overview. B‐Epic leveraged the Transformer to extract semantic features of AA sequences and classify BCEs based on MSCC. Overall, B‐Epic was tested across the IEDB testing dataset, the peptidome of Trypanosoma cruzi ( T. cruzi), and licensed vaccine targets. The applications of B‐Epic included the de novo construction of a vaccine candidate library for H. pylori and the identification of pan‐immunoreactive peptides for EBV. B) A comparative performance assessment between two natural language models was presented, focusing on ESM‐2 (with varying complexity defined by Units*Numbers) and ProtTrans. C) A comparative performance assessment (MSCC and the other 4 machine learning approaches) was presented. Five classifiers were evaluated by AUC, ACC, FPR, and FNR. SVM, XGBoost, RF, and MLP preprocessed AA embeddings by converting them to sequence embeddings (mean pooling), whereas MSCC enables classification directly using AA embeddings. D) Hyperparameter optimization of MSCC was shown in the heatmap. The hyperparameter optimization process involved comparing AUC (0.772–0.875) across different learning rates (10−4, 10−5), output channels (512, 1024, 2048), and training epochs (5, 10, 15). The color gradient from blue to pink indicates increasing AUC. E) A comparative analysis of B‐Epic against BepiPred‐1.0 and BepiPred‐3.0 in the IEDB testing dataset was presented. F) It was presented that computational efficiency was compared across three increasingly large datasets containing 10 000, 119 440, and 239 575 samples, respectively. The computational time was processed using a natural logarithm (ln) transformation and used for subsequent statistical analyses. Statistical significance was denoted as follows: * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001; ns (not significant). The significance level (α) was set at 0.05. Statistical analyses were performed using the two‐tailed Delong test (Figure 1E) and the paired two‐tailed t‐test (Figure 1F).
Figure 2
Figure 2
B‐Epic Screened out the Targets of Licensed Vaccines from Random Sequences. A) A comparative analysis of B‐Epic Score (the median ± IQR) between licensed vaccine protein targets (n = 11) and random protein controls (n = 10 000) was presented. B) Comparative analysis of B‐Epic Score encompassing 20 vaccine targets (proteins and peptides, with peptides ranging in length from 6 to 57 AAs) versus random sequence controls (n = 40 000) was presented, featuring key vaccine targets: HBsAg (HBV), CSP (Plasmodium), prM (dengue virus), and HA (influenza virus). C,D) Comparisons of B‐Epic Score for HBsAg (C; pink) and VP1 (D; pink) against their respective viral structural proteins (from their host species) were presented. E,F) Distributions of Sliding B‐Epic Score for HBsAg (E) and VP1 (F) were shown. Color segments were indicated as follows: known vaccine sequences (blue), high B cell activation regions (orange; B‐Epic Score > 0.35, Sliding B‐Epic Score > 0.25), and moderately high B cell activation regions (green; B‐Epic Score > 0.25, Sliding B‐Epic Score > 0.15). The red curve represented the LOESS trend based on Sliding B‐Epic Score. Statistical significance was denoted as follows: * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001; ns (not significant). The significance level (α) was set at 0.05. Statistical analyses were performed using the two‐tailed Mann‐Whitney U test (Figure 2A,B).
Figure 3
Figure 3
B‐Epic Identifies Immunoreactive Peptides from the Peptidome of T. cruzi. A) A schematic depicted the detection of immunoreactivity of T. cruzi peptidome (239575 peptides) from 7 sera of patients with Chagas disease via ELISA chips, as described in Santiago J. Carmona's article. B) The experimental results were binarized using reactogenicity thresholds of 3 (np‐neg; multiple samples) and 7 (np‐neg; per sample), as described in Santiago J. Carmona's article. AUCs of B‐Epic were calculated using the B‐Epic Score and binary results from the 7 ELISA chips. C) A comparative analysis was conducted to present the AUCs of B‐Epic, BepiPred‐1.0, and BepiPred‐3.0 in the peptidome of T. cruzi across 7 ESLIA chips. Statistical significance was denoted as follows: * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001; ns (not significant). The significance level (α) was set at 0.05. Statistical analyses were performed using the two‐tailed Mann‐Whitney U test (Figure 3C).
Figure 4
Figure 4
De Novo Development of H. pylori Potential Vaccine Candidate Library with Experimental Validation. A) The construction of a vaccine candidate library of H. pylori was shown. From 336 350 H. pylori sequences in UniProt, 25 accessible proteins (as defined in the Methods section) were split into 15‐mer peptides using a sliding window with a step size of 1 AA. Overall, 11972 15‐mer peptides were input into B‐Epic, with 50 of these peptides were ultimately included in the vaccine candidate library. B) A comparison of B‐Epic Score between accessible and non‐accessible H. pylori proteins was presented (the median ± IQR). C) B‐Epic Score rankings of 25 accessible proteins and 24 transmembrane proteins were shown. Gray, turquoise, pink, and black represented bacterial flagellum, cell surface, secreted, and transmembrane proteins, respectively. NAP, CGA1, VACA1, and VACA2 were potential vaccine targets with experimental evidence of B‐cell activation. D) The protein‐level vaccine library was constructed from the accessible proteins. Thresholds were set as: B‐Epic Score > 0.02 (median B‐Epic Score of random proteins) and Foreignness Score > 0 (negative Bit Score of DIAMOND). Triangles represented accessible proteins, and circles represented non‐accessible proteins. E) Overall, 11 972 peptides were generated from 25 accessible proteins. For these peptides, the x‐ and y‐axes represented B‐Epic Score and ln(Max EL Score), with thresholds of 0.35 and ln(0.25), respectively. The color gradient from blue to pink indicated the Sliding B‐Epic Score with a threshold of 0.25. In addition, a Foreignness Score > 0 was considered during construction of the peptide‐level vaccine library, but wasn't displayed in this chart. The 9 VCPs for subsequent experiments were highlighted in the table (right). F) The distribution of Sliding B‐Epic Score for VACA1 was exhibited. Orange, green, and red lines represented high B cell activation regions, moderately high B cell activation regions, and the LOESS trend based on Sliding B‐Epic Score, respectively. Turquoise, pink, and gray lines represented penetrating, outer membrane, and inner membrane, predicted using TMHMM v2.0c. G) The surface and secondary structure of VACA1 were shown. This structure contained functional regions and transporters. VACA1‐1223 and VACA1‐616 (for subsequent experiments) were highlighted by reduced transparency on the VACA1 surface (rendered with high overall transparency). Surface rendering with high transparency distinguished between high B cell activation regions (red), moderately high B cell activation regions (orange), and remaining regions (blue). Statistical significance was denoted as follows: * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001; ns (not significant). The significance level (α) was set at 0.05. Statistical analyses were performed using the two‐tailed Mann‐Whitney U test (Figure 4B,C).
Figure 5
Figure 5
Immunization with Predicted H. pylori BCEs Elicits Robust Antibody Responses in Mice. A) A schematic of the immunization protocol was presented. C57BL/6 mice (n = 21) received subcutaneous injections of VCPs formulated with CpG adjuvant. B) Dose‐dependent ELISA detected the expression of specific antibodies against VCPs (n = 9; 3 replicates per VCP) and NC (n = 1; 3 replicates per NC) in mouse sera, demonstrating the specificity of the immune response across multiple serum dilutions. C) Quantification of specific IgG responses at a 1:80 dilution was shown. Mice immunized with VCPs or NC showed significantly elevated antibody titers compared to PBS controls. D) Correlation analysis between predictive results and ELISA results was presented. E) Flow cytometric quantification of GC B cells following peptide immunization. Blue and red squares represented distinct GC B cell populations. F) Representative immunofluorescence images of lymph nodes showed a lymph node marker (green), along with staining of B cell (yellow) and T cell (red) populations. Points in statistical charts (C,E) represented individual mice. Data were presented as the mean ± SD (B‐E; some small SDs not visually distinguishable). Statistical significance was denoted as follows: * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001; ns (not significant). The significance level (α) was set at 0.05. Statistical analyses were performed using the two‐tailed two‐way ANOVA (Figure 5B) and two‐tailed t‐test (Figure 5C–E).
Figure 6
Figure 6
B‐Epic Exhibited Exceptional Performance on Identifying Pan‐Immunoreactive Peptides of EBV in Large Clinical Cohort. A) A schematic of B‐Epic's application against EBV was presented. The correlation between PhIP‐seq and B‐Epic in a large clinical cohort across 899 sera was analyzed, and specific antibody levels against EBV peptides with high B‐Epic scores were detected via ELISA in another large cohort. B) The y‐axis represented serological positivity rates across 899 sera based on PhIP‐seq enrichment. Blue dots indicated antigens meeting significance thresholds (> 50% prevalence, PhIP‐seq P < 0.001). Pink dots highlighted the high pan‐immunogenicity of EBNA1 across three EBV strains. C) The chart presented a comparative analysis of Spearman correlation coefficients between PhIP‐seq enrichment (34 peptides) and predictions from B‐Epic, BepiPred‐1.0, and BepiPred‐3.0 across 899 sera. Different intervals of correlation coefficient were colored as follows: grey (< 0.2), blue (0.2–0.4), green (0.4–0.6), and pink (> 0.6). D) Integrated visualization was generated to present the B‐Epic Score for EBNA1, gp350, and gB from strain AG876 (left), PhIP‐seq enrichment patterns across 899 sera (center heatmap), and corresponding B‐Epic Score of sequences in the PhIP‐seq assay (right). E) Distribution of Sliding B‐Epic Score for EBNA1 393–448 across three EBV strains was shown. The pink line represented the known highly immunoreactive epitope “PPRRP”. F) ELISA detected levels of specific antibodies against EBNA1 15‐mer peptides with high/low B‐Epic Score in NPC patients (n = 80) and healthy controls (n = 60). Statistical significance was denoted as follows: * P < 0.05; ** P < 0.01; *** P < 0.001; **** P < 0.0001; ns (not significant). The significance level (α) was set at 0.05. Statistical analyses were performed using the two‐tailed Mann‐Whitney U test (Figure 6C) and two‐tailed t test (Figure 6F).

References

    1. Bonanni P., Sacco C., Donato R., Capei R., Clin Microbiol. Infect. 2014, 20, 32. - PubMed
    1. Greenwood B., Philos. Trans. R. Soc. B: Biol. Sci. 2014, 369, 20130433. - PMC - PubMed
    1. Mansoor I., Eassa H. A., Mohammed K. H. A., Abd El‐Fattah M. A., Abdo M. H., Rashad E., Eassa H. A., Saleh A., Amin O. M., Nounou M. I., Ghoneim O., AAPS PharmSciTech 2022, 23, 103. - PMC - PubMed
    1. Li Y., Wang X., Blau D. M., Caballero M. T., Feikin D. R., Gill C. J., Madhi S. A., Omer S. B., Simões E. A. F., Campbell H., Pariente A. B., Bardach D., Bassat Q., Casalegno J.‐S., Chakhunashvili G., Crawford N., Danilenko D., Do L. A. H., Echavarria M., Gentile A., Gordon A., Heikkinen T., Huang Q. S., Jullien S., Krishnan A., Lopez E. L., Markic J., Mira‐Iglesias A., Moore H. C., Moyes J., et al., Lancet 2022, 399, 2047. - PMC - PubMed
    1. Sun C., Kang Y.‐F., Fang X.‐Y., Liu Y.‐N., Bu G.‐L., Wang A.‐J., Li Y., Zhu Q.‐Y., Zhang H., Xie C., Kong X.‐W., Peng Y.‐J., Lin W.‐J., Zhou L., Chen X.‐C., Lu Z.‐Z., Xu H.‐Q., Hong D.‐C., Zhang X., Zhong L., Feng G.‐K., Zeng Y.‐X., Xu M., Zhong Q., Liu Z., Zeng M.‐S., Cell Host Microbe 2023, 31, 1882. - PubMed

MeSH terms

Substances

LinkOut - more resources