. 2019 Sep 1;35(17):3020-3027.

doi: 10.1093/bioinformatics/btz029.

AllerCatPro-prediction of protein allergenicity potential from the protein sequence

Sebastian Maurer-Stroh^{1

2}, Nora L Krutz³, Petra S Kern³, Vithiagaran Gunalan¹, Minh N Nguyen¹, Vachiranee Limviphuvadh¹, Frank Eisenhaber^{1

2}, G Frank Gerberick⁴

Affiliations

¹ Biomolecular Function Discovery Division, Bioinformatics Institute, Agency for Science, Technology and Research, Singapore.
² Department of Biological Sciences, National University of Singapore, Singapore.
³ The Procter & Gamble Services Company, Strombeek-Bever, Belgium.
⁴ The Procter and Gamble Company, Mason, OH, USA.

PMID: 30657872
PMCID: PMC6736023
DOI: 10.1093/bioinformatics/btz029

AllerCatPro-prediction of protein allergenicity potential from the protein sequence

Sebastian Maurer-Stroh et al. Bioinformatics. 2019.

. 2019 Sep 1;35(17):3020-3027.

doi: 10.1093/bioinformatics/btz029.

Authors

Sebastian Maurer-Stroh^{1

2}, Nora L Krutz³, Petra S Kern³, Vithiagaran Gunalan¹, Minh N Nguyen¹, Vachiranee Limviphuvadh¹, Frank Eisenhaber^{1

2}, G Frank Gerberick⁴

Affiliations

¹ Biomolecular Function Discovery Division, Bioinformatics Institute, Agency for Science, Technology and Research, Singapore.
² Department of Biological Sciences, National University of Singapore, Singapore.
³ The Procter & Gamble Services Company, Strombeek-Bever, Belgium.
⁴ The Procter and Gamble Company, Mason, OH, USA.

PMID: 30657872
PMCID: PMC6736023
DOI: 10.1093/bioinformatics/btz029

Abstract

Motivation: Due to the risk of inducing an immediate Type I (IgE-mediated) allergic response, proteins intended for use in consumer products must be investigated for their allergenic potential before introduction into the marketplace. The FAO/WHO guidelines for computational assessment of allergenic potential of proteins based on short peptide hits and linear sequence window identity thresholds misclassify many proteins as allergens.

Results: We developed AllerCatPro which predicts the allergenic potential of proteins based on similarity of their 3D protein structure as well as their amino acid sequence compared with a data set of known protein allergens comprising of 4180 unique allergenic protein sequences derived from the union of the major databases Food Allergy Research and Resource Program, Comprehensive Protein Allergen Resource, WHO/International Union of Immunological Societies, UniProtKB and Allergome. We extended the hexamer hit rule by removing peptides with high probability of random occurrence measured by sequence entropy as well as requiring 3 or more hexamer hits consistent with natural linear epitope patterns in known allergens. This is complemented with a Gluten-like repeat pattern detection. We also switched from a linear sequence window similarity to a B-cell epitope-like 3D surface similarity window which became possible through extensive 3D structure modeling covering the majority (74%) of allergens. In case no structure similarity is found, the decision workflow reverts to the old linear sequence window rule. The overall accuracy of AllerCatPro is 84% compared with other current methods which range from 51 to 73%. Both the FAO/WHO rules and AllerCatPro achieve highest sensitivity but AllerCatPro provides a 37-fold increase in specificity.

Availability and implementation: https://allercatpro.bii.a-star.edu.sg/.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
AllerCatPro workflow, search methods and databases. **(A)** Decision workflow of AllerCatPro from the query protein to the results of either strong, weak or no evidence for allergenic potential. (S1–S5) Search methods utilized at different stages of the workflow. (D1–D3) Databases created and used for the searches in the workflow

**Fig. 2.**
Prediction of protein sequence similarity towards protein allergens by the k-mer method. Screening for similarity between a query protein sequence and a sequence in the allergen database is based on identical k-mer hits **(A)**. Evaluation of an appropriate k-mer length based on the predictive power of different k-mer lengths by using the UniProt database of known allergens from 2005 to predict all allergens known in 2015 **(B)**. Differences in the excess of percent true minus false positives depending on the k-mer length and entropy degree (ent034 = entropy bit score > 0.34) **(C)**

**Fig. 3.**
Prediction of linear sequence window and 3D epitope similarity. Screening for similarity between a query protein sequence and a sequence in the allergen database based on a sequence window of 80 residues with at least 35% identity **(A)**. Matching of a query protein sequence with unknown 3D structure and the closest known allergen over all possible 3D structural epitopes within the created comprehensive 3D structural database of known allergens **(B)**

**Fig. 4**
AllerCatPro performance. Performance of AllerCatPro is calculated as accuracy to predict allergens (n = 221) versus non-allergens (n = 221) with the same structural fold compared with FAO/WHO rules (window-rule only, no k-mer), PREAL, AllerHunter, AllergenFP and AllerTOPv2 **(A)**. By our definition, sharing the fold with an allergen already results in a weak evidence prediction. Therefore, the calculation of accuracy here is based on strong prediction on known allergen as true positive, weak prediction on known allergen as false negative, weak prediction on non-allergen as true negative and strong prediction on non-allergen as false positive. For the same benchmark, the respective sensitivity **(B)** and specificity **(C)** is highlighted

**Fig. 5**
Interface of AllerCatPro version 1.7. Submitting one or more protein sequences in FASTA format **(A)** leads to the AllerCatPro output table with the result for strong, weak or no evidence for allergenicity per protein based on corresponding workflow decisions and, in case of a hit, the possibility to view the most similar proteins **(B)** as well as the most similar 3D surface epitope via links **(C)**. The structural view shows identical epitope residues as balls (colored as blue for positive charges, red for negative charges and gray for all other amino acid types)

See this image and copyright information in PMC

References

1. Altschul S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402. - PMC - PubMed
1. Burley S.K. et al. (2017) Protein Data Bank (PDB): the single global macromolecular structure archive. Methods Mol. Biol., 1607, 627–641. - PMC - PubMed
1. Carugo O. (2010) Structural similarity between native proteins and chimera constructs obtained by inverting the amino acid sequence. Acta Chim Slov., 57, 936–940. - PubMed
1. Dall’antonia F. et al. (2014) Structure of allergens and structure based epitope predictions. Methods, 66, 3–21. - PMC - PubMed
1. Dimitrov I. et al. (2014a) AllerTOP v.2–a server for in silico prediction of allergens. J. Mol. Model., 20, 2278.. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

AllerCatPro-prediction of protein allergenicity potential from the protein sequence

Affiliations

AllerCatPro-prediction of protein allergenicity potential from the protein sequence

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Medical