Convolutional neural network-based annotation of bacterial type IV secretion system effectors with enhanced accuracy and reduced false discovery
- PMID: 31860715
- DOI: 10.1093/bib/bbz120
Convolutional neural network-based annotation of bacterial type IV secretion system effectors with enhanced accuracy and reduced false discovery
Abstract
The type IV bacterial secretion system (SS) is reported to be one of the most ubiquitous SSs in nature and can induce serious conditions by secreting type IV SS effectors (T4SEs) into the host cells. Recent studies mainly focus on annotating new T4SE from the huge amount of sequencing data, and various computational tools are therefore developed to accelerate T4SE annotation. However, these tools are reported as heavily dependent on the selected methods and their annotation performance need to be further enhanced. Herein, a convolution neural network (CNN) technique was used to annotate T4SEs by integrating multiple protein encoding strategies. First, the annotation accuracies of nine encoding strategies integrated with CNN were assessed and compared with that of the popular T4SE annotation tools based on independent benchmark. Second, false discovery rates of various models were systematically evaluated by (1) scanning the genome of Legionella pneumophila subsp. ATCC 33152 and (2) predicting the real-world non-T4SEs validated using published experiments. Based on the above analyses, the encoding strategies, (a) position-specific scoring matrix (PSSM), (b) protein secondary structure & solvent accessibility (PSSSA) and (c) one-hot encoding scheme (Onehot), were identified as well-performing when integrated with CNN. Finally, a novel strategy that collectively considers the three well-performing models (CNN-PSSM, CNN-PSSSA and CNN-Onehot) was proposed, and a new tool (CNN-T4SE, https://idrblab.org/cnnt4se/) was constructed to facilitate T4SE annotation. All in all, this study conducted a comprehensive analysis on the performance of a collection of encoding strategies when integrated with CNN, which could facilitate the suppression of T4SS in infection and limit the spread of antimicrobial resistance.
Keywords: T4SE; bacterial secretion system; convolution neural network; effector protein; function annotation.
© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Similar articles
-
Effective prediction of bacterial type IV secreted effectors by combined features of both C-termini and N-termini.J Comput Aided Mol Des. 2017 Nov;31(11):1029-1038. doi: 10.1007/s10822-017-0080-z. Epub 2017 Nov 10. J Comput Aided Mol Des. 2017. PMID: 29127583
-
T4SEfinder: a bioinformatics tool for genome-scale prediction of bacterial type IV secreted effectors using pre-trained protein language model.Brief Bioinform. 2022 Jan 17;23(1):bbab420. doi: 10.1093/bib/bbab420. Brief Bioinform. 2022. PMID: 34657153
-
T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors.Comput Struct Biotechnol J. 2024 Jan 23;23:801-812. doi: 10.1016/j.csbj.2024.01.015. eCollection 2024 Dec. Comput Struct Biotechnol J. 2024. PMID: 38328004 Free PMC article.
-
Systematic analysis and prediction of type IV secreted effector proteins by machine learning approaches.Brief Bioinform. 2019 May 21;20(3):931-951. doi: 10.1093/bib/bbx164. Brief Bioinform. 2019. PMID: 29186295 Free PMC article.
-
Automatic feature extraction and fusion recognition of motor imagery EEG using multilevel multiscale CNN.Med Biol Eng Comput. 2021 Oct;59(10):2037-2050. doi: 10.1007/s11517-021-02396-w. Epub 2021 Aug 23. Med Biol Eng Comput. 2021. PMID: 34424453 Review.
Cited by
-
Pushing the Boundaries of Molecular Property Prediction for Drug Discovery with Multitask Learning BERT Enhanced by SMILES Enumeration.Research (Wash D C). 2022 Dec 15;2022:0004. doi: 10.34133/research.0004. eCollection 2022. Research (Wash D C). 2022. PMID: 39285949 Free PMC article.
-
Recent Advances in Predicting Protein S-Nitrosylation Sites.Biomed Res Int. 2021 Feb 9;2021:5542224. doi: 10.1155/2021/5542224. eCollection 2021. Biomed Res Int. 2021. PMID: 33628788 Free PMC article. Review.
-
GIMICA: host genetic and immune factors shaping human microbiota.Nucleic Acids Res. 2021 Jan 8;49(D1):D715-D722. doi: 10.1093/nar/gkaa851. Nucleic Acids Res. 2021. PMID: 33045729 Free PMC article.
-
DeepTorrent: a deep learning-based approach for predicting DNA N4-methylcytosine sites.Brief Bioinform. 2021 May 20;22(3):bbaa124. doi: 10.1093/bib/bbaa124. Brief Bioinform. 2021. PMID: 32608476 Free PMC article.
-
A Cancer Gene Module Mining Method Based on Bio-Network of Multi-Omics Gene Groups.Front Oncol. 2020 Jun 19;10:1159. doi: 10.3389/fonc.2020.01159. eCollection 2020. Front Oncol. 2020. PMID: 32637361 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Molecular Biology Databases