Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;8(3):e58173.
doi: 10.1371/journal.pone.0058173. Epub 2013 Mar 5.

T3_MM: a Markov model effectively classifies bacterial type III secretion signals

Affiliations

T3_MM: a Markov model effectively classifies bacterial type III secretion signals

Yejun Wang et al. PLoS One. 2013.

Abstract

Motivation: Type III Secretion Systems (T3SSs) play important roles in the interaction between gram-negative bacteria and their hosts. T3SSs function by translocating a group of bacterial effector proteins into the host cytoplasm. The details of specific type III secretion process are yet to be clarified. This research focused on comparing the amino acid composition within the N-terminal 100 amino acids from type III secretion (T3S) signal sequences or non-T3S proteins, specifically whether each residue exerts a constraint on residues found in adjacent positions. We used these comparisons to set up a statistic model to quantitatively model and effectively distinguish T3S effectors.

Results: In this study, the amino acid composition (Aac) probability profiles conditional on its sequentially preceding position and corresponding amino acids were compared between N-terminal sequences of T3S and non-T3S proteins. The profiles are generally different. A Markov model, namely T3_MM, was consequently designed to calculate the total Aac conditional probability difference, i.e., the likelihood ratio of a sequence being a T3S or a non-T3S protein. With T3_MM, known T3S and non-T3S proteins were found to well approximate two distinct normal distributions. The model could distinguish validated T3S and non-T3S proteins with a 5-fold cross-validation sensitivity of 83.9% at a specificity of 90.3%. T3_MM was also shown to be more robust, accurate, simple, and statistically quantitative, when compared with other T3S protein prediction models. The high effectiveness of T3_MM also indicated the overall Aac difference between N-termini of T3S and non-T3S proteins, and the constraint of Aac exerted by its preceding position and corresponding Aac.

Availability: An R package for T3_MM is freely downloadable from: http://biocomputer.bio.cuhk.edu.hk/softwares/T3_MM. T3_MM web server: http://biocomputer.bio.cuhk.edu.hk/T3DB/T3_MM.php.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Distribution of bi-amino acids (bi-aa) with significant difference of Aac conditional probability in T3S signal sequences.
The vertical and horizontal axis represents the 1st and 2nd amino acid respectively. A binomial distribution-based statistic test was performed to each amino acid at the second position given the first amino acid. The second amino acid with significantly biased composition compared with theoretical random distribution was highlighted in red (enriched) or grey (depleted) background. The second amino acid with significantly biased composition compared with non-T3S sequences was indicated with an upward (higher in T3S sequences) or downward (depleted in T3S sequences) arrow. Benjamini & Hochberg correction for multiple tests was adopted to control the type I errors . The False Discovery Rate (FDR) was set as ≤0.05.
Figure 2
Figure 2. Probabilistically modelling the overall difference of conditional probability profiles of T3S and non-T3S sequences.
The distribution (black curves) and normal approximations (grey curves) of T3S and non-T3S R values (A) and weighted R values (B) were shown. The means of approximated normal distributions were also indicated. For each normal approximation, the Normal Q–Q plot and Shapiro-Wilk normality test results were shown nearby corresponding distribution curve.
Figure 3
Figure 3. Receiver Operating Characteristic curves of different T3S protein classification models.
The point of cutoff value (R = 0) was indicated with a black rectangle and an arrow.
Figure 4
Figure 4. Inter-species cross validation of the T3S effector predictions.
The sensitivity (Sn) and specificity (Sp) of classification were shown in blue and purple, respectively. The T3S effector recall of each representative genera or subgroup was also indicated. Genus names are listed below each series of dots.
Figure 5
Figure 5. Summary of the total genome-encoding proteins, T3_MM predicted T3S effectors and BPBAac predicted T3S effectors in Salmonella.
The total protein number for each Salmonella strain was depicted and linked with a line in red, while the number of T3S effectors predicted by T3_MM and BPBAac was shown in blue and purple, respectively. The patterns of these three lines were generally similar with moderate difference.

Similar articles

Cited by

References

    1. Hueck CJ (1998) Type III protein secretion systems in bacterial pathogens of animals and plants. Mol. Biol. Rev., 62, 379–433. - PMC - PubMed
    1. Wang Y, Huang H, Sun M, Zhang Q, Guo D (2012) T3DB: an integrated database for bacterial Type III Secretion System. BMC Bioinformatics 13: 66. - PMC - PubMed
    1. Enninga J, Rosenshine I (2009) Imaging the assembly, structure and activity of type III secretion systems.Cell Microbiol., 11, 1462–1470. - PubMed
    1. Izoré T, Job V, Dessen A (2011) Biogenesis, regulation, and targeting of the type III secretion system. Structure, 19, 603–12. - PubMed
    1. Lindeberg M, Collmer A (2009) Gene Ontology for type III effectors: capturing processes at the host-pathogen interface. Trends Microbiol., 17, 304–11. - PubMed

LinkOut - more resources