Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;14 Suppl 4(Suppl 4):S1.
doi: 10.1186/1471-2105-14-S4-S1. Epub 2013 Mar 8.

Evaluation and integration of existing methods for computational prediction of allergens

Affiliations

Evaluation and integration of existing methods for computational prediction of allergens

Jing Wang et al. BMC Bioinformatics. 2013.

Abstract

Background: Allergy involves a series of complex reactions and factors that contribute to the development of the disease and triggering of the symptoms, including rhinitis, asthma, atopic eczema, skin sensitivity, even acute and fatal anaphylactic shock. Prediction and evaluation of the potential allergenicity is of importance for safety evaluation of foods and other environment factors. Although several computational approaches for assessing the potential allergenicity of proteins have been developed, their performance and relative merits and shortcomings have not been compared systematically.

Results: To evaluate and improve the existing methods for allergen prediction, we collected an up-to-date definitive dataset consisting of 989 known allergens and massive putative non-allergens. The three most widely used allergen computational prediction approaches including sequence-, motif- and SVM-based (Support Vector Machine) methods were systematically compared using the defined parameters and we found that SVM-based method outperformed the other two methods with higher accuracy and specificity. The sequence-based method with the criteria defined by FAO/WHO (FAO: Food and Agriculture Organization of the United Nations; WHO: World Health Organization) has higher sensitivity of over 98%, but having a low specificity. The advantage of motif-based method is the ability to visualize the key motif within the allergen. Notably, the performances of the sequence-based method defined by FAO/WHO and motif eliciting strategy could be improved by the optimization of parameters. To facilitate the allergen prediction, we integrated these three methods in a web-based application proAP, which provides the global search of the known allergens and a powerful tool for allergen predication. Flexible parameter setting and batch prediction were also implemented. The proAP can be accessed at http://gmobl.sjtu.edu.cn/proAP/main.html.

Conclusions: This study comprehensively evaluated sequence-, motif- and SVM-based computational prediction approaches for allergens and optimized their parameters to obtain better performance. These findings may provide helpful guidance for the researchers in allergen-prediction. Furthermore, we integrated these methods into a web application proAP, greatly facilitating users to do customizable allergen search and prediction.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The flow diagram of dataset collection. The setup processes of the positive dataset were displayed in blue lines and that of the negative dataset were in green. And a reversed negative dataset was built as the flow in dark pink for evaluating FAO/WHO rule 2 specifically.
Figure 2
Figure 2
The performance of FAO/WHO criteria. FW* denotes FAO/WHO. The figure displayed the comparison result of each FAO/WHO criterion. Both rule 1 and rule2 had a high sensitivity, even greater than 90% with the rule 1 individually. However the corresponding specificity was only 23.05%.
Figure 3
Figure 3
Wordsize influence on the capability of FAO/WHO rule 1. The map illustrated the FAO/WHO rule 1's performance variation trend of adjusting the length of exact matched amino acids from 6 to 14. The accuracy ameliorated dramatically with increasing of wordsize from 6 to 8. No significant improvement was observed when we increased wordsize further.
Figure 4
Figure 4
The impact of sequence similarity on FAO/WHO rule 2. It showed the FAO/WHO rule 2's performance by adjusting the sequence identity threshold from 25% to 70%. With the threshold increasing, the specificity rose up to 99.39% from 20.22% and the sensitivity dropped a slight. The best accuracy was obtained at identity of 55%.
Figure 5
Figure 5
The ROC curves of various approaches for allergen prediction.
Figure 6
Figure 6
Snapshot of web server pages. (A) A snapshot of the sequence submission page; (B) A snapshot of the prediction result page.

Similar articles

Cited by

References

    1. Taylor SL. Protein allergenicity assessment of foods produced through agricultural biotechnology. Annu Rev Pharmacal Toxical. 2002;42:99–112. doi: 10.1146/annurev.pharmtox.42.082401.130208. - DOI - PubMed
    1. Lee YH, Sinko PJ. Oral delivery of salmon calcitonin. Adv Drug Deliv Rev. 2000;42:225–238. doi: 10.1016/S0169-409X(00)00063-6. - DOI - PubMed
    1. Mekori YA. Introduction to allergic diseases. Crit Rev Food Sci Nutr. 1996;36(Suppl.):S1–S18. - PubMed
    1. Nieuwenhuizen NE, Lopata AL. Fighting food allergy: Current Approaches. Ann N Y Acad Sci. 2005;1056:30–45. doi: 10.1196/annals.1352.003. - DOI - PubMed
    1. Metcalfe DD, Astwood JD, Townsend R, Sampson HA, Taylor SL, Fuchs RL. Assessment of the allergenic potential of foods derived from genetically engineered crop plants. Crit Rev Food Sci Nutr. 1996;36(Suppl.):S165–S186. - PubMed

Publication types

LinkOut - more resources