Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2019 Mar 25;20(2):638-658.
doi: 10.1093/bib/bby028.

iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites

Affiliations
Review

iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites

Jiangning Song et al. Brief Bioinform. .

Abstract

Regulation of proteolysis plays a critical role in a myriad of important cellular processes. The key to better understanding the mechanisms that control this process is to identify the specific substrates that each protease targets. To address this, we have developed iProt-Sub, a powerful bioinformatics tool for the accurate prediction of protease-specific substrates and their cleavage sites. Importantly, iProt-Sub represents a significantly advanced version of its successful predecessor, PROSPER. It provides optimized cleavage site prediction models with better prediction performance and coverage for more species-specific proteases (4 major protease families and 38 different proteases). iProt-Sub integrates heterogeneous sequence and structural features and uses a two-step feature selection procedure to further remove redundant and irrelevant features in an effort to improve the cleavage site prediction accuracy. Features used by iProt-Sub are encoded by 11 different sequence encoding schemes, including local amino acid sequence profile, secondary structure, solvent accessibility and native disorder, which will allow a more accurate representation of the protease specificity of approximately 38 proteases and training of the prediction models. Benchmarking experiments using cross-validation and independent tests showed that iProt-Sub is able to achieve a better performance than several existing generic tools. We anticipate that iProt-Sub will be a powerful tool for proteome-wide prediction of protease-specific substrates and their cleavage sites, and will facilitate hypothesis-driven functional interrogation of protease-specific substrate cleavage and proteolytic events.

Keywords: cleavage site; five-step rule; machine learning; protease; sequence analysis; substrate.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The workflow of the iProt-Sub methodology. There exist four major stages during the development of iProt-Sub, including Data set curation, Feature extraction and encoding, Model construction and Performance evaluation. Refer to the main text for a detailed description of each of the major stages. ‘All features’ included all the 11 types of extracted features (a detailed list is shown in Table 2).
Figure 2
Figure 2
ROC curves of iProt-Sub models trained using different encoding schemes and their combinations for cleavage site prediction of eight proteases on the 5-fold cross-validation test.
Figure 3
Figure 3
Sequence logo representations of experimentally verified cleavage sites (P8–P8′) of eight proteases caspase-3, -6, -7, -8, MMP-2, 3, granzyme-B (human) and granzyme-B (mouse). Sequence logos were generated using pLogo and scaled to the height of the largest column within the sequence visualization. The red horizontal lines on the pLogo graph denote the threshold of P = 0.05.
Figure 4
Figure 4
The feature selection curve in stepwise feature selection describes the performance change (in terms of AUC) as a function of the number of gradually increased OFCs.
Figure 5
Figure 5
Importance of different feature types to the improvement of cleavage site prediction performance for eight proteases. The height of each bar for a feature type represents the proportional ‘contribution percentage’ that represents the AUC value of the feature selection model for one protease, and the uniformed AUC drop rate for each type of feature is represented with different colors. The AUC drop rate was obtained by comparing with the model after removing this feature from the input.
Figure 6
Figure 6
Performance comparison between iProt-Sub and other existing methods for cleavage site prediction for different proteases based on the independent test data sets, evaluated using ROC curves.
Figure 7
Figure 7
Example output of the iProt-Sub Web server.
Figure 8
Figure 8
Functional enrichment analysis of the predicted substrates for caspase-3, -6, MMP-2 and -3 at the proteome scale, according to the BP, CC and MF classifications of GO terms. The statistical enrichment analyses of GO terms for predicted substrates were performed with the hypergeometric distribution.
Figure 9
Figure 9
The KEGG pathway enrichment analysis of the predicted substrates for caspase-3, -6, MMP-2 and -3 at the human proteome scale. The statistical enrichment analyses of GO terms for predicted substrates were performed with the hypergeometric distribution.
Figure 10
Figure 10
Full-length sequence scanning of calpastatin by iProt-Sub for caspase-3 (above) and MMP-2 cleavage sites (below). The horizontal axis denotes the amino acid residue position, while the vertical axis denotes the cleavage probability score generated by iProt-Sub. A higher threshold value of 0.95 is applied to identify the high-confidence cleavage site predictions, denoted by the dashed line. P4–P4′ sites surrounding the predicted cleavage P1 position are given.

Similar articles

Cited by

References

    1. López-Otín C, Overall CM.. Protease degradomics: a new challenge for proteomics. Nat Rev Mol Cell Biol 2002;3(7):509–19. - PubMed
    1. Goldberg AL. Protein degradation and protection against misfolded or damaged proteins. Nature 2003;426(6968):895–9. - PubMed
    1. Sternlicht MD, Werb Z.. How matrix metalloproteinases regulate cell behavior. Annu Rev Cell Dev Biol 2001;17(1):463–516. - PMC - PubMed
    1. Turk B, Turk D, Turk V.. Protease signalling: the cutting edge. EMBO J 2012;31(7):1630–43. - PMC - PubMed
    1. Sevenich L, Joyce JA.. Pericellular proteolysis in cancer. Genes Dev 2014;28(21):2331–47. - PMC - PubMed

Publication types