Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan 25;14(1):e0202312.
doi: 10.1371/journal.pone.0202312. eCollection 2019.

Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila

Affiliations

Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila

Zhila Esna Ashari et al. PLoS One. .

Abstract

Type IV secretion systems exist in a number of bacterial pathogens and are used to secrete effector proteins directly into host cells in order to change their environment making the environment hospitable for the bacteria. In recent years, several machine learning algorithms have been developed to predict effector proteins, potentially facilitating experimental verification. However, inconsistencies exist between their results. Previously we analysed the disparate sets of predictive features used in these algorithms to determine an optimal set of 370 features for effector prediction. This study focuses on the best way to use these optimal features by designing three machine learning classifiers, comparing our results with those of others, and obtaining de novo results. We chose the pathogen Legionella pneumophila strain Philadelphia-1, a cause of Legionnaires' disease, because it has many validated effector proteins and others have developed machine learning prediction tools for it. While all of our models give good results indicating that our optimal features are quite robust, Model 1, which uses all 370 features with a support vector machine, has slightly better accuracy. Moreover, Model 1 predicted 472 effector proteins that are deemed highly probable to be effectors and include 94% of known effectors. Although the results of our three models agree well with those of other researchers, their models only predicted 126 and 311 candidate effectors.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Workflow.
Fig 2
Fig 2. ROC curves for three designed classifiers for 10-fold, cross-validation results.
(a) Model 1, (b) Model 2, and (c) Model 3.
Fig 3
Fig 3. Venn diagram comparing predicted effector proteins for three methods.
The pink circle shows the results for Model 1, the yellow circle for the S4TE method, and the blue circle for the method by Burstein et al.

Similar articles

Cited by

References

    1. Han N, Yu W, Qiang Y, Zhang W. T4SP Database 2.0: An Improved Database for Type IV Secretion Systems in Bacterial Genomes with New Online Analysis Tools. Computational and Mathematical Methods in Medicine. 2016; 2016, 9415459 (10.1155/2016/9415459) - DOI - PMC - PubMed
    1. Voth DE, Broederdorf LJ, Graham JG. Bacterial Type IV Secretion Systems: Versatile Virulence Machines. Future Microbiology. 2012; 7(2), 241–257. (10.2217/fmb.11.150) - DOI - PMC - PubMed
    1. Voth DE, Beare PA, Howe D, Sharma UM, Samoilis G, Cockrell DC, et al. The Coxiella burnetii Cryptic Plasmid Is Enriched in Genes Encoding Type IV Secretion System Substrate. Journal of Bacteriology. 2010; 193(7), 1493–1503. (10.1128/JB.01359-10) - DOI - PMC - PubMed
    1. Abby SS, Cury J, Guglielmini J, Néron B, Touchon M, Rocha EPC. Identification of protein secretion systems in bacterial genomes. Scientific Reports. 2016; 6 (10.1038/srep23080). - DOI - PMC - PubMed
    1. Burstein D, Zusman T, Degtyar E, Viner R, Segal G, Pupko T. Genome-Scale Identification of Legionella pneumophila Effectors Using a Machine Learning Approach. The International Journal of Biochemistry and Cell Biology. 2009; 5(7). (10.1371/journal.ppat.1000508) - DOI - PMC - PubMed

Publication types

MeSH terms