Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 18;18(2):e1009151.
doi: 10.1371/journal.pcbi.1009151. eCollection 2022 Feb.

Towards the prediction of non-peptidic epitopes

Affiliations

Towards the prediction of non-peptidic epitopes

Paul F Zierep et al. PLoS Comput Biol. .

Abstract

In-silico methods for the prediction of epitopes can support and improve workflows for vaccine design, antibody production, and disease therapy. So far, the scope of B cell and T cell epitope prediction has been directed exclusively towards peptidic antigens. Nevertheless, various non-peptidic molecular classes can be recognized by immune cells. These compounds have not been systematically studied yet, and prediction approaches are lacking. The ability to predict the epitope activity of non-peptidic compounds could have vast implications; for example, for immunogenic risk assessment of the vast number of drugs and other xenobiotics. Here we present the first general attempt to predict the epitope activity of non-peptidic compounds using the Immune Epitope Database (IEDB) as a source for positive samples. The molecules stored in the Chemical Entities of Biological Interest (ChEBI) database were chosen as background samples. The molecules were clustered into eight homogeneous molecular groups, and classifiers were built for each cluster with the aim of separating the epitopes from the background. Different molecular feature encoding schemes and machine learning models were compared against each other. For those models where a high performance could be achieved based on simple decision rules, the molecular features were then further investigated. Additionally, the findings were used to build a web server that allows for the immunogenic investigation of non-peptidic molecules (http://tools-staging.iedb.org/np_epitope_predictor). The prediction quality was tested with samples from independent evaluation datasets, and the implemented method received noteworthy Receiver Operating Characteristic-Area Under Curve (ROC-AUC) values, ranging from 0.69-0.96 depending on the molecule cluster.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Cluster inertia plotted against the number of clusters (k).
The cluster inertia is computed as the sum of squared distances of samples to their closest cluster.
Fig 2
Fig 2. Principal component visualization of the ChEBI dataset.
(a) Principal components of the 8 clusters and their sizes. (b) T cell epitopes. (c) B cell epitopes.
Fig 3
Fig 3. Example molecules for each cluster generated for the ChEBI dataset.
ChEBI IDs used for the example molecules: (a) Steroid/terpenoid like: CHEBI:776; (b) Betaine/glycerolipid derivatives: CHEBI:17636; (c) Fatty acid derivatives: CHEBI:16196; (d) Acyl-CoA derivatives: CHEBI:11010; (e) Glucoside/oligosaccharide derivatives: CHEBI:16551; (f) Nucleobase-containing molecular entities: CHEBI:15422; (g) Diverse small molecules: CHEBI:55395; (h) Cyclic Halide / Phenols: CHEBI:59246. All examples represent molecules that have been tested positive in B cell essays—except for the acyl-CoA derivatives, where no epitope was described.
Fig 4
Fig 4. Cross-validation performance of the RF models for different radii parameters used to generate Morgan fingerprints.
The prediction of epitopes that tested positive in T cell assays (a) and B cell assays (b).
Fig 5
Fig 5. Model comparison for different feature sets for the epitopes that tested positive in B cell assays.
Cluster 3 was not benchmarked, since there were no epitopes in this structural class.
Fig 6
Fig 6. Model comparison for different feature sets for the epitopes that tested positive in T cell assays.
Cluster 3 was not benchmarked, since there were no epitopes in this structural class.
Fig 7
Fig 7. Performance of the epitope classifiers for different feature sets.
Cluster 3 is not benchmarked, since there were no epitopes in this structural class. The RF classifiers are depicted with a continuous line and the similarity classifiers are shown with a dotted line.
Fig 8
Fig 8. Substructures of most significant fingerprint features for the classification of T cell epitopes of the fatty acid derivatives (cluster 2).
A depiction of each feature is shown (smaller box) alongside an example molecule containing it (larger box). In the feature box, the central atom is labeled with a purple sphere; aliphatic ring atoms are labeled with grey spheres. In the molecule box, all matched feature atoms are labeled with blue spheres. The statistics of the features are shown in Table 4.
Fig 9
Fig 9. Histogram of the fingerprint feature (ID:16163127) count responsible for T cell prediction of the glucoside/oligosaccharide derivatives (cluster 4).
The vast majority of epitopes have a long fatty acid chain attached to the glycoside. (a) Example molecule with 20 fingerprint features; all matched feature atoms are labeled with blue spheres. (b) Depiction of the fingerprint feature; the central atom is labeled with a purple sphere.
Fig 10
Fig 10. Substructures of most significant fingerprint features for the classification of B cell epitopes of the glucoside/oligosaccharide derivatives (cluster 4).
A depiction of each feature is shown (smaller box) alongside an example molecule containing it (larger box). In the feature box, the central atom is labeled with a purple sphere; aliphatic and aromatic ring atoms are labeled with grey and yellow spheres. In the molecule box, all matched feature atoms are labeled with blue spheres.
Fig 11
Fig 11. Substructures of most significant fingerprint features for the classification of B cell epitopes of the nucleobase-containing molecular entities (cluster 5).
A depiction of each feature is shown (smaller box) alongside an example molecule containing it (larger box). In the feature box, the central atom is labeled with a purple sphere; aliphatic and aromatic ring atoms are labeled with grey and yellow spheres. In the molecule box, all matched feature atoms are labeled with blue spheres. The statistics of the features are shown in Table 7.
Fig 12
Fig 12. The feature responsible for the prediction of T cell recognition of the nucleobase-containing molecular entities (cluster 5).
A depiction of the feature is shown (smaller box) alongside an example molecule containing it (larger box). In the feature box, the central atom is labeled with a purple sphere. In the molecule box, all matched feature atoms are labeled with blue spheres.

References

    1. Regenmortel MHVV. Immunoinformatics may lead to a reappraisal of the nature of B cell epitopes and of the feasibility of synthetic peptide vaccines. J Mol Recognit. 2006;19: 183–187. doi: 10.1002/jmr.768 - DOI - PubMed
    1. Leinikki P, Lehtinen M, Hyöty H, Parkkonen P, Kantanen M-L, Hakulinen J. Synthetic Peptides as Diagnostic Tools in Virology. In: Maramorosch K, Murphy FA, Shatkin AJ, editors. Advances in Virus Research. Academic Press; 1993. pp. 149–186. doi: 10.1016/s0065-3527(08)60085-8 - DOI - PubMed
    1. Börmer OP, Thrane-Steen K. Epitope group specificity of six immunoassays for carcinoembryonic antigen. Tumour Biol. 1991;12: 9–15. doi: 10.1159/000217682 - DOI - PubMed
    1. Chow SN, Chen KW, Su SL, Tung J, Lee CY. Generation and epitope analysis of thyroid stimulating hormone-specific monoclonal antibodies for enzyme immunoassays. Biotechnol Appl Biochem. 1988;10: 137–142. - PubMed
    1. Paraf A, Peltre G. Immunoassays in Food and Agriculture. Springer Netherlands; 1991.

Publication types