Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Mar 27:9:173.
doi: 10.1186/1471-2105-9-173.

LocateP: genome-scale subcellular-location predictor for bacterial proteins

Affiliations

LocateP: genome-scale subcellular-location predictor for bacterial proteins

Miaomiao Zhou et al. BMC Bioinformatics. .

Abstract

Background: In the past decades, various protein subcellular-location (SCL) predictors have been developed. Most of these predictors, like TMHMM 2.0, SignalP 3.0, PrediSi and Phobius, aim at the identification of one or a few SCLs, whereas others such as CELLO and Psortb.v.2.0 aim at a broader classification. Although these tools and pipelines can achieve a high precision in the accurate prediction of signal peptides and transmembrane helices, they have a much lower accuracy when other sequence characteristics are concerned. For instance, it proved notoriously difficult to identify the fate of proteins carrying a putative type I signal peptidase (SPIase) cleavage site, as many of those proteins are retained in the cell membrane as N-terminally anchored membrane proteins. Moreover, most of the SCL classifiers are based on the classification of the Swiss-Prot database and consequently inherited the inconsistency of that SCL classification. As accurate and detailed SCL prediction on a genome scale is highly desired by experimental researchers, we decided to construct a new SCL prediction pipeline: LocateP.

Results: LocateP combines many of the existing high-precision SCL identifiers with our own newly developed identifiers for specific SCLs. The LocateP pipeline was designed such that it mimics protein targeting and secretion processes. It distinguishes 7 different SCLs within Gram-positive bacteria: intracellular, multi-transmembrane, N-terminally membrane anchored, C-terminally membrane anchored, lipid-anchored, LPxTG-type cell-wall anchored, and secreted/released proteins. Moreover, it distinguishes pathways for Sec- or Tat-dependent secretion and alternative secretion of bacteriocin-like proteins. The pipeline was tested on data sets extracted from literature, including experimental proteomics studies. The tests showed that LocateP performs as well as, or even slightly better than other SCL predictors for some locations and outperforms current tools especially where the N-terminally anchored and the SPIase-cleaved secreted proteins are concerned. Overall, the accuracy of LocateP was always higher than 90%. LocateP was then used to predict the SCLs of all proteins encoded by completed Gram-positive bacterial genomes. The results are stored in the database LocateP-DB http://www.cmbi.ru.nl/locatep-db1.

Conclusion: LocateP is by far the most accurate and detailed protein SCL predictor for Gram-positive bacteria currently available.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(A): Classification of protein SCLs in Gram-positive bacteria. The secreted proteins can be divided into the following subgroups: (i) N-terminal hydrophobic tail anchored (N-anchored), (ii) C-terminal hydrophobic tail anchored (C-anchored), (iii) covalent lipid-anchored, (iv) covalently/non-covalently cell-wall anchored, (v) secreted/released (defined as proteins that are Sec-/Tat-secreted and cleaved by the signal peptidase I), and (vi) non-classically secreted/released proteins via minor pathways [120, 163]. Based on the Swiss Prot classification system the SCLs could be categorized into: Cytoplasmic, Membrane (multi-transmembrane, N-/C-anchored), Cell wall (LPxTG-anchored) and Extracellular (lipid-anchored, secreted, bacteriocin-like) proteins. (B): The structure of known signal peptides. The overall structure of Tat- and Sec-dependent signal peptides is commonly conserved as distinct consecutive N, H and C regions. The N region is the start of the protein containing positively charged residues. The H region follows the N region and is a string of consecutive hydrophobic residues which can form an α-helix in the membrane. The C region contains the signal peptidase cleavage signals. Known cleavage/retention signals include the AxAA type I SPase cleavage site [163, 172], the L-x-x-C (so-called lipobox) type II SPase cleavage site [157] and the AxA Tat-substrate cleavage site [88, 90, 173]. The LPxTG-type motif is a C-terminal sorting signal which is involved in the covalent attachment of proteins to the peptidoglycan of the cell wall. The signal peptide of proteins targeted for minor secretion pathways does not follow the N-H-C structure [2, 125, 163].
Figure 2
Figure 2
Flowchart of the LocateP pipeline. Firstly, the possibility of being secreted by the Tat pathway was calculated by combining Tat-find v1.2 [91] and our Tat-specific HMMs (RR-HMM, CS-HMM). Bacteriocin-like proteins were identified using Bagel [149]. Secondly, Phobius [14], PrediSi [98], SignalP 3.0 [18] and TMHMM 2.0 [12] were combined to identify transmembrane regions. Those proteins without any predicted TM segments were considered intracellular, whereas those with TM segments were divided into multi-TM membrane proteins, N-anchored membrane proteins or secreted/released proteins (single N-terminal TM segment, possibly signal peptide), and C-anchored membrane proteins (signal peptide and single C-terminal TM segment). Thirdly, a sortase-substrate HMM [165] was used to distinguish LPxTG-type peptidoglycan-anchored proteins from C-anchored membrane proteins. Subsequently, signal peptidase type II (SPII) substrates were predicted by combining existing lipoprotein motif models [41, 157] and new lipoprotein HMMs. The remaining proteins were classified into the categories secreted/released or N-anchored membrane proteins. See Methods and additional file 1 for more details. Abbreviation: A-S = Anchored-Secreted; TMS = TransMembrane Segment; SP = Signal Peptide; C/N-TM = C/N-terminally transmembrane anchored; LPxTG = LPxTG cell-wall anchored.
Figure 3
Figure 3
Distinguishing between secreted and N-anchored proteins. Tjalsma et al. [41] have identified 33 N-anchored and 36 secreted proteins from Bacillus subtilis (by 2D gel electrophoresis) which have a putative SPI-cleavage site motif in the C-region that follows the transmembrane helix H-region (see Fig. 1B). (A): A sequence composition chart, made using WebLogo [47], based on multiple-sequence alignment of the H- and C-regions (see Fig. 1B) of the N-anchored and secreted protein sets. The red arrow indicates the cleavage position of true SPI-site motifs (see Figure 1B), and the green dashed arrow represents the corresponding position in N-anchored proteins that is not cleaved. (B): The specificity of HMMs of different lengths containing the putative cleavage site A* = the Alanine after which cleavage takes place. Mod1: residues -9 to A*; Mod2: residues -11 to A*; Mod3: residues -14 to A*; Mod4: residues -8 to +3 of A*; Mod5: residues -13 to +10 of A*; Mod6: residues -8 to +17 of A*; Mod7: residues -3 to +10 of A*; Mod8: residues -3 to +17 of A*; Mod9: residues +1 to +25.

Similar articles

Cited by

References

    1. LocateP-DB http://www.cmbi.ru.nl/locatep-db
    1. Tjalsma H, Bolhuis A, Jongbloed JD, Bron S, van Dijl JM. Signal peptide-dependent protein transport in Bacillus subtilis: a genome-based survey of the secretome. Microbiol Mol Biol Rev. 2000;64:515–547. - PMC - PubMed
    1. Huang F, Parmryd I, Nilsson F, Persson AL, Pakrasi HB, Andersson B, Norling B. Proteomics of Synechocystis sp. strain PCC 6803: identification of plasma membrane proteins. Mol Cell Proteomics. 2002;1:956–966. - PubMed
    1. Molloy MP, Phadke ND, Maddock JR, Andrews PC. Two-dimensional electrophoresis and peptide mass fingerprinting of bacterial outer membrane proteins. Electrophoresis. 2001;22:1686–1696. - PubMed
    1. Molloy MP, Herbert BR, Slade MB, Rabilloud T, Nouwens AS, Williams KL, Gooley AA. Proteomic analysis of the Escherichia coli outer membrane. Eur J Biochem. 2000;267:2871–2881. - PubMed

Publication types

MeSH terms