. 2019 Jun 1;35(12):2017-2028.

doi: 10.1093/bioinformatics/bty914.

Bastion3: a two-layer ensemble predictor of type III secreted effectors

Jiawei Wang¹, Jiahui Li^{1

2}, Bingjiao Yang³, Ruopeng Xie³, Tatiana T Marquez-Lago^{4

5}, André Leier^{4

5}, Morihiro Hayashida⁶, Tatsuya Akutsu⁷, Yanju Zhang³, Kuo-Chen Chou^{8

9

10}, Joel Selkrig¹¹, Tieli Zhou², Jiangning Song^{12

13

14}, Trevor Lithgow¹

Affiliations

¹ Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, Australia.
² Department of Clinical Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China.
³ School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China.
⁴ Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA.
⁵ Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA.
⁶ National Institute of Technology, Matsue College, Matsue, Shimane, Japan.
⁷ Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan.
⁸ Gordon Life Science Institute, Boston, MA, USA.
⁹ Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
¹⁰ Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia.
¹¹ European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.
¹² Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, Australia.
¹³ Monash Centre for Data Science, Monash University, Melbourne, VIC, Australia.
¹⁴ ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia.

PMID: 30388198
PMCID: PMC7963071
DOI: 10.1093/bioinformatics/bty914

Bastion3: a two-layer ensemble predictor of type III secreted effectors

Jiawei Wang et al. Bioinformatics. 2019.

. 2019 Jun 1;35(12):2017-2028.

doi: 10.1093/bioinformatics/bty914.

Authors

Affiliations

¹ Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, Australia.
² Department of Clinical Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China.
³ School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China.
⁴ Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA.
⁵ Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA.
⁶ National Institute of Technology, Matsue College, Matsue, Shimane, Japan.
⁷ Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan.
⁸ Gordon Life Science Institute, Boston, MA, USA.
⁹ Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
¹⁰ Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia.
¹¹ European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.
¹² Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, Australia.
¹³ Monash Centre for Data Science, Monash University, Melbourne, VIC, Australia.
¹⁴ ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia.

PMID: 30388198
PMCID: PMC7963071
DOI: 10.1093/bioinformatics/bty914

Abstract

Motivation: Type III secreted effectors (T3SEs) can be injected into host cell cytoplasm via type III secretion systems (T3SSs) to modulate interactions between Gram-negative bacterial pathogens and their hosts. Due to their relevance in pathogen-host interactions, significant computational efforts have been put toward identification of T3SEs and these in turn have stimulated new T3SE discoveries. However, as T3SEs with new characteristics are discovered, these existing computational tools reveal important limitations: (i) most of the trained machine learning models are based on the N-terminus (or incorporating also the C-terminus) instead of the proteins' complete sequences, and (ii) the underlying models (trained with classic algorithms) employed only few features, most of which were extracted based on sequence-information alone. To achieve better T3SE prediction, we must identify more powerful, informative features and investigate how to effectively integrate these into a comprehensive model.

Results: In this work, we present Bastion3, a two-layer ensemble predictor developed to accurately identify type III secreted effectors from protein sequence data. In contrast with existing methods that employ single models with few features, Bastion3 explores a wide range of features, from various types, trains single models based on these features and finally integrates these models through ensemble learning. We trained the models using a new gradient boosting machine, LightGBM and further boosted the models' performances through a novel genetic algorithm (GA) based two-step parameter optimization strategy. Our benchmark test demonstrates that Bastion3 achieves a much better performance compared to commonly used methods, with an ACC value of 0.959, F-value of 0.958, MCC value of 0.917 and AUC value of 0.956, which comprehensively outperformed all other toolkits by more than 5.6% in ACC value, 5.7% in F-value, 12.4% in MCC value and 5.8% in AUC value. Based on our proposed two-layer ensemble model, we further developed a user-friendly online toolkit, maximizing convenience for experimental scientists toward T3SE prediction. With its design to ease future discoveries of novel T3SEs and improved performance, Bastion3 is poised to become a widely used, state-of-the-art toolkit for T3SE prediction.

Availability and implementation: http://bastion3.erc.monash.edu/.

Contact: selkrig@embl.de or wyztli@163.com or or trevor.lithgow@monash.edu.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
Overall framework of Bastion3. (A) The flowchart of Bastion3 development; (B) Detailed procedures for constructing the prediction models within Bastion3’s two-layer architecture and (C) Tackling the data imbalance problem by assigning a weight to each sample

**Fig. 2.**
The effect and performance comparison of two-step parameter optimization of different feature encoding methods, compared with one-step parameter optimization and initial parameter settings. The red star indicates the best performance amongst the three different parameter settings for each feature encoding method

**Fig. 3.**
Performance comparison of different types of feature encoding methods based on 100-time 5-fold cross-validation test. (A) Embedding of different types of features using t-SNE (van der Maaten and Hinton, 2008). The red and grey dots represent T3SEs and non-T3SEs, respectively. A black-edge dot indicates that this sample was incorrectly predicted during 100-time 5-fold cross-validation. (B) ROC curves and metrics for evaluating the performance of different types of feature encoding methods. The legends of the two panels were merged together with the same feature encoding method denoted by the same color in both panels. The red star on top of the bar chart marks the best performance across different feature encoding methods for each metric

**Fig. 4.**
Performance comparison between Bastion3 (using the final two-layer ensemble model) and six other existing methods for T3SE prediction on the independent test

See this image and copyright information in PMC

References

1. An Y. et al. . (2018) Comprehensive assessment and performance improvement of effector protein predictors for bacterial secretion systems III, IV and VI. Brief. Bioinf., 19, 148–161. - PubMed
1. An Y. et al. . (2017) SecretEPDB: a comprehensive web-based resource for secreted effector proteins of the bacterial types III, IV and VI secretion systems. Sci. Rep. ,7, 41031. - PMC - PubMed
1. Arnold R. et al. . (2009) Sequence-based prediction of type III secreted proteins. PLoS Pathogens ,5, e1000376. - PMC - PubMed
1. Bateman A. et al. . (2002) The Pfam protein families database. Nucleic Acids Res. ,30, 276–280. - PMC - PubMed
1. Birtalan S.C. et al. . (2002) Three-dimensional secretion signals in chaperone-effector complexes of bacterial pathogens. Mol. Cell ,9, 971–980. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

R01 AI111965/AI/NIAID NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Bastion3: a two-layer ensemble predictor of type III secreted effectors

Affiliations

Bastion3: a two-layer ensemble predictor of type III secreted effectors

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources