Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov 11;10(11):e0142658.
doi: 10.1371/journal.pone.0142658. eCollection 2015.

Characterizing Protease Specificity: How Many Substrates Do We Need?

Affiliations

Characterizing Protease Specificity: How Many Substrates Do We Need?

Michael Schauperl et al. PLoS One. .

Abstract

Calculation of cleavage entropies allows to quantify, map and compare protease substrate specificity by an information entropy based approach. The metric intrinsically depends on the number of experimentally determined substrates (data points). Thus a statistical analysis of its numerical stability is crucial to estimate the systematic error made by estimating specificity based on a limited number of substrates. In this contribution, we show the mathematical basis for estimating the uncertainty in cleavage entropies. Sets of cleavage entropies are calculated using experimental cleavage data and modeled extreme cases. By analyzing the underlying mathematics and applying statistical tools, a linear dependence of the metric in respect to 1/n was found. This allows us to extrapolate the values to an infinite number of samples and to estimate the errors. Analyzing the errors, a minimum number of 30 substrates was found to be necessary to characterize substrate specificity, in terms of amino acid variability, for a protease (S4-S4') with an uncertainty of 5 percent. Therefore, we encourage experimental researchers in the protease field to record specificity profiles of novel proteases aiming to identify at least 30 peptide substrates of maximum sequence diversity. We expect a full characterization of protease specificity helpful to rationalize biological functions of proteases and to assist rational drug design.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Dependence of the Pseudo-constant C with the number of samples n.
The black full line indicates the value for an unspecific pocket, the black dotted lines indicate the region where the value of the Pseudo-constant is less than 15% off compared to the infinite number value. The green dashed dotted line shows the behavior of the constant for a specific pocket and the blue dashed line of a specific pocket with rare events (p<1%).
Fig 2
Fig 2. Trend of the measured entropy and estimated entropy with the number of known substrates.
The cases of a totally unspecific pocket (upper left), an unspecific pocket (upper right) and an unspecific pocket with rare events (lower left) are shown. The filled areas correspond to the possible measured values including the standard deviation.
Fig 3
Fig 3. Trend of the measured and estimated entropy with the reciprocal number of known substrates.
The cases of a totally unspecific pocket (upper left), an unspecific pocket (upper right) and an unspecific pocket with rare events (lower left) are shown. The filled areas correspond to the possible measured values including the standard deviation.
Fig 4
Fig 4. Trend of the naïve and estimated entropy for trypsin pocket S4, S1 and the sum of S4-S4’pockets.
The behavior of the corrected entropy (black line) with the number of known substrates. The red line is the real/infinite sample entropy and the blue line corresponds to the naïve estimated entropy value. Trend is plotted for S4 (upper left), S1 (upper right), and the sum of S4 to S4’ (lower left).
Fig 5
Fig 5. Comparison of the different entropy estimators.
The estimation process presented in this work is outperforming the compared published estimators.
Fig 6
Fig 6. Systematic sketch of the estimation process for the corrected cleavage entropy.
Based on experimental substrate data the specificity A is calculated. Through bootstrapping a subset is created and the specificity of this subset is calculated to generate B. By performing a linear fit and extrapolating the specificity to 0 in 1/n space we estimate the specificity for infinite substrates.

References

    1. Puente XS, Sanchez LM, Gutierrez-Fernandez A, Velasco G, Lopez-Otin C. A genomic view of the complexity of mammalian proteolytic systems. Biochem Soc Trans. 2005;33(Pt 2):331–4. - PubMed
    1. Madala PK, Tyndall JD, Nall T, Fairlie DP. Update 1 of: Proteases universally recognize beta strands in their active sites. Chem Rev. 2010;110(6):PR1–31. 10.1021/cr900368a - DOI - PubMed
    1. Richter C, Tanaka T, Yada RY. Mechanism of activation of the gastric aspartic proteinases: pepsinogen, progastricsin and prochymosin. Biochem J. 1998;335 (Pt 3):481–90. - PMC - PubMed
    1. Hengartner MO. The biochemistry of apoptosis. Nature. 2000;407(6805):770–6. - PubMed
    1. Davie EW, Fujikawa K, Kisiel W. The coagulation cascade: initiation, maintenance, and regulation. Biochemistry. 1991;30(43):10363–70. - PubMed

Publication types

LinkOut - more resources