Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov;31(11):1029-1038.
doi: 10.1007/s10822-017-0080-z. Epub 2017 Nov 10.

Effective prediction of bacterial type IV secreted effectors by combined features of both C-termini and N-termini

Affiliations

Effective prediction of bacterial type IV secreted effectors by combined features of both C-termini and N-termini

Yu Wang et al. J Comput Aided Mol Des. 2017 Nov.

Abstract

Various bacterial pathogens can deliver their secreted substrates also called as effectors through type IV secretion systems (T4SSs) into host cells and cause diseases. Since T4SS secreted effectors (T4SEs) play important roles in pathogen-host interactions, identifying them is crucial to our understanding of the pathogenic mechanisms of T4SSs. A few computational methods using machine learning algorithms for T4SEs prediction have been developed by using features of C-terminal residues. However, recent studies have shown that targeting information can also be encoded in the N-terminal region of at least some T4SEs. In this study, we present an effective method for T4SEs prediction by novelly integrating both N-terminal and C-terminal sequence information. First, we collected a comprehensive dataset across multiple bacterial species of known T4SEs and non-T4SEs from literatures. Then, three types of distinctive features, namely amino acid composition, composition, transition and distribution and position-specific scoring matrices were calculated for 50 N-terminal and 100 C-terminal residues. After that, we employed information gain represent to rank the importance score of the 150 different position residues for T4SE secretion signaling. At last, 125 distinctive position residues were singled out for the prediction model to classify T4SEs and non-T4SEs. The support vector machine model yields a high receiver operating curve of 0.916 in the fivefold cross-validation and an accuracy of 85.29% for the independent test set.

Keywords: Effector; Machine learning; Sequence analysis; Type IV secretion system.

PubMed Disclaimer

Similar articles

Cited by

References

    1. Nat Microbiol. 2016 Jul 26;1(8):16107 - PubMed
    1. Proteins. 2010 May 15;78(7):1789-97 - PubMed
    1. Front Cell Infect Microbiol. 2015 Oct 13;5:72 - PubMed
    1. Nature. 2009 Dec 24;462(7276):1011-5 - PubMed
    1. PLoS Pathog. 2013;9(8):e1003556 - PubMed

LinkOut - more resources