Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila
- PMID: 30682021
- PMCID: PMC6347213
- DOI: 10.1371/journal.pone.0202312
Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila
Abstract
Type IV secretion systems exist in a number of bacterial pathogens and are used to secrete effector proteins directly into host cells in order to change their environment making the environment hospitable for the bacteria. In recent years, several machine learning algorithms have been developed to predict effector proteins, potentially facilitating experimental verification. However, inconsistencies exist between their results. Previously we analysed the disparate sets of predictive features used in these algorithms to determine an optimal set of 370 features for effector prediction. This study focuses on the best way to use these optimal features by designing three machine learning classifiers, comparing our results with those of others, and obtaining de novo results. We chose the pathogen Legionella pneumophila strain Philadelphia-1, a cause of Legionnaires' disease, because it has many validated effector proteins and others have developed machine learning prediction tools for it. While all of our models give good results indicating that our optimal features are quite robust, Model 1, which uses all 370 features with a support vector machine, has slightly better accuracy. Moreover, Model 1 predicted 472 effector proteins that are deemed highly probable to be effectors and include 94% of known effectors. Although the results of our three models agree well with those of other researchers, their models only predicted 126 and 311 candidate effectors.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures



Similar articles
-
An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach.PLoS One. 2018 May 9;13(5):e0197041. doi: 10.1371/journal.pone.0197041. eCollection 2018. PLoS One. 2018. PMID: 29742157 Free PMC article.
-
Dot/Icm Effector Translocation by Legionella longbeachae Creates a Replicative Vacuole Similar to That of Legionella pneumophila despite Translocation of Distinct Effector Repertoires.Infect Immun. 2015 Oct;83(10):4081-92. doi: 10.1128/IAI.00461-15. Epub 2015 Jul 27. Infect Immun. 2015. PMID: 26216429 Free PMC article.
-
Genome-scale identification of Legionella pneumophila effectors using a machine learning approach.PLoS Pathog. 2009 Jul;5(7):e1000508. doi: 10.1371/journal.ppat.1000508. Epub 2009 Jul 10. PLoS Pathog. 2009. PMID: 19593377 Free PMC article.
-
Legionella pneumophila type IV effectors hijack the transcription and translation machinery of the host cell.Trends Cell Biol. 2014 Dec;24(12):771-8. doi: 10.1016/j.tcb.2014.06.002. Epub 2014 Jul 8. Trends Cell Biol. 2014. PMID: 25012125 Review.
-
Mechanisms of Effector-Mediated Immunity Revealed by the Accidental Human Pathogen Legionella pneumophila.Front Cell Infect Microbiol. 2021 Feb 3;10:593823. doi: 10.3389/fcimb.2020.593823. eCollection 2020. Front Cell Infect Microbiol. 2021. PMID: 33614523 Free PMC article. Review.
Cited by
-
T4SE-XGB: Interpretable Sequence-Based Prediction of Type IV Secreted Effectors Using eXtreme Gradient Boosting Algorithm.Front Microbiol. 2020 Sep 24;11:580382. doi: 10.3389/fmicb.2020.580382. eCollection 2020. Front Microbiol. 2020. PMID: 33072049 Free PMC article.
-
DeepT3_4: A Hybrid Deep Neural Network Model for the Distinction Between Bacterial Type III and IV Secreted Effectors.Front Microbiol. 2021 Jan 21;12:605782. doi: 10.3389/fmicb.2021.605782. eCollection 2021. Front Microbiol. 2021. PMID: 33552038 Free PMC article.
-
T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors.Comput Struct Biotechnol J. 2024 Jan 23;23:801-812. doi: 10.1016/j.csbj.2024.01.015. eCollection 2024 Dec. Comput Struct Biotechnol J. 2024. PMID: 38328004 Free PMC article.
-
Assessment of vector-host-pathogen relationships using data mining and machine learning.Comput Struct Biotechnol J. 2020 Jun 25;18:1704-1721. doi: 10.1016/j.csbj.2020.06.031. eCollection 2020. Comput Struct Biotechnol J. 2020. PMID: 32670510 Free PMC article. Review.
-
Computational prediction of secreted proteins in gram-negative bacteria.Comput Struct Biotechnol J. 2021 Mar 22;19:1806-1828. doi: 10.1016/j.csbj.2021.03.019. eCollection 2021. Comput Struct Biotechnol J. 2021. PMID: 33897982 Free PMC article. Review.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Molecular Biology Databases