Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Apr 15;24(8):1049-55.
doi: 10.1093/bioinformatics/btn084. Epub 2008 Mar 4.

Predicting proteolytic sites in extracellular proteins: only halfway there

Affiliations

Predicting proteolytic sites in extracellular proteins: only halfway there

Yossef Kliger et al. Bioinformatics. .

Abstract

Motivation: Many secretory proteins are synthesized as inactive precursors that must undergo post-translational proteolysis in order to mature and become active. In the current study, we address the challenge of sequence-based discovery of proteolytic sites in secreted proteins using machine learning.

Results: The results revealed that only half of the extracellular proteolytic sites are currently annotated, leaving over 3600 unannotated ones. Furthermore, we have found that only 6% of the unannotated sites are similar to known proteolytic sites, whereas the remaining 94% do not share significant similarity with any annotated proteolytic site. The computational challenges in these two cases are very different. While the precision in detecting the former group is close to perfect, only a mere 22% of the latter group were detected with a precision of 80%. The applicability of the classifier is demonstrated through members of the FGF family, in which we verified the conservation of physiologically-relevant proteolytic sites in homologous proteins.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The effect of creating two specialized classifiers. It is clear that the performance of classifiers for ‘seen before’ and ‘new’ sites should be evaluated separately. Furthermore, the figure shows that it is worth training specialized classifiers: (A) Identification of ‘seen before’ sites. The classifier trained to identify ‘seen before’ sites is somewhat better at identifying such sites than the classifier trained to identify ‘new’ sites. (B) Identification of ‘new’ sites. The classifier trained to identify ‘new’ sites performs better than the classifier trained to identify ‘seen before’ sites at identifying ‘new’ sites.
Fig. 2.
Fig. 2.
Comparison between RF and SVM classifiers specialized in ‘new’ sites, and the effect of the furin correction factor. VALIDATED and POTENTIAL data are treated as positive for testing, the rest as negative. The furin correction is a way to compensate for the fact that some of the data we treated as negative for cleavage is actually mislabeled (unknown proteolytic sites). (A) Raw score output of the RF and SVM classifiers; (B) Precision is multiplied by 3.04, which is the calculated furin correction factor. It should be remarked that because of the imperfection of the correction procedure, corrected precision values may exceed 1. Precision values that exceed 1 are set to 1.
Fig. 3.
Fig. 3.
Proteolytic site predictions for FGF23 of human, three mutant forms from ADHR patients, and three vertebrate orthologs. Sequences of FGF23 of human, mouse, rat and pufferfish were aligned together with R179W, R179Q and R176Q human FGF23 mutants (mutations are highlighted in dark grey). High score cleavage predictions were assigned to the true cleavage sites (highlighted in light grey). In normal FGF23, cleavage is known to take place between the two amino acids in light grey.
Fig. 4.
Fig. 4.
FGF3 and other FGF family members that undergo proteolysis in their N-terminal region. Proteolysis of the N-terminal region of FGF3 is important for regulating its activity. FGF11 to 14 were also assigned high score N-terminal cleavage site predictions, although they do not have a leading signal peptide. Removing the signal peptides of FGF3 members allows alignment of the N-terminal proteolytic sites. The high conservation of the proteolytic site signatures in contrast to the variability of the flanking sequences, confirms the importance of the proteolytic processing that as in FGF3 may be involved in the regulation of protein activity.

References

    1. Anderson NL, et al. The human plasma proteome: a nonredundant list developed by combination of four separate sources. Mol. Cell Proteomics. 2004;3:311–326. - PubMed
    1. Antoine M, et al. NH2-terminal cleavage of xenopus fibroblast growth factor 3 is necessary for optimal biological activity and receptor binding. Cell Growth Differ. 2000;11:593–605. - PubMed
    1. Bahbouhi B, et al. Effects of L- and D-REKR amino acid-containing peptides on HIV and SIV envelope glycoprotein precursor maturation and HIV and SIV replication. Biochem. J. 2002;366:863–872. - PMC - PubMed
    1. Basak A. Inhibitors of proprotein convertases. J. Mol. Med. 2005;83:844–855. - PubMed
    1. Bendtsen JD, et al. Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 2004;340:783–795. - PubMed

Substances