Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 14;18(Suppl 3):66.
doi: 10.1186/s12859-017-1472-8.

Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features

Affiliations

Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features

Shun-Long Weng et al. BMC Bioinformatics. .

Abstract

Background: Protein carbonylation, an irreversible and non-enzymatic post-translational modification (PTM), is often used as a marker of oxidative stress. When reactive oxygen species (ROS) oxidized the amino acid side chains, carbonyl (CO) groups are produced especially on Lysine (K), Arginine (R), Threonine (T), and Proline (P). Nevertheless, due to the lack of information about the carbonylated substrate specificity, we were encouraged to develop a systematic method for a comprehensive investigation of protein carbonylation sites.

Results: After the removal of redundant data from multipe carbonylation-related articles, totally 226 carbonylated proteins in human are regarded as training dataset, which consisted of 307, 126, 128, and 129 carbonylation sites for K, R, T and P residues, respectively. To identify the useful features in predicting carbonylation sites, the linear amino acid sequence was adopted not only to build up the predictive model from training dataset, but also to compare the effectiveness of prediction with other types of features including amino acid composition (AAC), amino acid pair composition (AAPC), position-specific scoring matrix (PSSM), positional weighted matrix (PWM), solvent-accessible surface area (ASA), and physicochemical properties. The investigation of position-specific amino acid composition revealed that the positively charged amino acids (K and R) are remarkably enriched surrounding the carbonylated sites, which may play a functional role in discriminating between carbonylation and non-carbonylation sites. A variety of predictive models were built using various features and three different machine learning methods. Based on the evaluation by five-fold cross-validation, the models trained with PWM feature could provide better sensitivity in the positive training dataset, while the models trained with AAindex feature achieved higher specificity in the negative training dataset. Additionally, the model trained using hybrid features, including PWM, AAC and AAindex, obtained best MCC values of 0.432, 0.472, 0.443 and 0.467 on K, R, T and P residues, respectively.

Conclusion: When comparing to an existing prediction tool, the selected models trained with hybrid features provided a promising accuracy on an independent testing dataset. In short, this work not only characterized the carbonylated substrate preference, but also demonstrated that the proposed method could provide a feasible means for accelerating preliminary discovery of protein carbonylation.

Keywords: Amino acid composition; Physicochemical properties; Protein carbonylation; Reactive Oxygen Species (ROS).

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Analytical flowchart of the identification of protein carbonylation sites
Fig. 2
Fig. 2
Comparison of amino acid composition between carbonylated and non-carbnylated sites on K, R, T and P residues
Fig. 3
Fig. 3
Entropy and frequency plots of position-specific amino acid composition of four carbonylated residues
Fig. 4
Fig. 4
TwoSampleLogo of four carbonlated residues. a Two-Sample Logo of Lysine (K). b Two-Sample Logo of Arginine (R). c Two-Sample Logo of Thereonine (T). d Two-Sample Logo of Proline (P)
Fig. 5
Fig. 5
The frequency differences of 20 × 20 amino acid pairs between carbonylated sites and non-carbonylated sites of lysine, arginine, threonine and proline. The amino acid pair with red box indicates an over-representation in carbonylated sites (positive data) comparing to non-carbonylated sites (negative data); on the other hand, green box means an under-representation
Fig. 6
Fig. 6
Comparison of the solvent-accessible surface area between carbonylated and non-carbonylated sites on K, R, T and P residues
Fig. 7
Fig. 7
Top 10 physicochemical properties of carbonylated sites on lysine ranked by the average value of F-score measurement in 21-mer window

Similar articles

Cited by

References

    1. van Kasteren SI, Kramer HB, Jensen HH, Campbell SJ, Kirkpatrick J, Oldham NJ, Anthony DC, Davis BG. Expanding the diversity of chemical protein modification allows post-translational mimicry. Nature. 2007;446(7139):1105–9. doi: 10.1038/nature05757. - DOI - PubMed
    1. Huang KY, Su MG, Kao HJ, Hsieh YC, Jhong JH, Cheng KH, Huang HD, Lee TY. dbPTM 2016: 10-year anniversary of a resource for post-translational modification of proteins. Nucleic Acids Res. 2016;44(D1):D435–46. doi: 10.1093/nar/gkv1240. - DOI - PMC - PubMed
    1. Huang KY, Wu HY, Chen YJ, Lu CT, Su MG, Hsieh YC, Tsai CM, Lin KI, Huang HD, Lee TY, et al. RegPhos 2.0: an updated resource to explore protein kinase-substrate phosphorylation networks in mammals. Database. 2014;2014(0):bau034. doi: 10.1093/database/bau034. - DOI - PMC - PubMed
    1. England K, O’Driscoll C, Cotter T. Carbonylation of glycolytic proteins is a key response to drug-induced oxidative stress and apoptosis. Cell Death Differ. 2004;11:252–60. doi: 10.1038/sj.cdd.4401338. - DOI - PubMed
    1. Jaisson S, Gillery P. Evaluation of nonenzymatic posttranslational modification-derived products as biomarkers of molecular aging of proteins. Clin Chem. 2010;56(9):1402–12. doi: 10.1373/clinchem.2010.145201. - DOI - PubMed

LinkOut - more resources