. 2020 Dec 1;39(23):e104523.

doi: 10.15252/embj.2020104523. Epub 2020 Oct 19.

Protein structure, amino acid composition and sequence determine proteome vulnerability to oxidation-induced damage

Affiliations

¹ Department of Systems Biology, Blavatnik Institute at Harvard Medical School, Boston, MA, USA.
² Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA.
³ Division of Biomedical Sciences, University of California Riverside School of Medicine, Riverside, CA, USA.
⁴ Department of Proteomics and Microbiology, Research Institute for Biosciences, University of Mons, Mons, Belgium.
⁵ Division of Biological and Environmental Sciences, Faculty of Natural Sciences, University of Stirling, Stirling, UK.

PMID: 33073387
PMCID: PMC7705453
DOI: 10.15252/embj.2020104523

Protein structure, amino acid composition and sequence determine proteome vulnerability to oxidation-induced damage

Roger L Chang et al. EMBO J. 2020.

. 2020 Dec 1;39(23):e104523.

doi: 10.15252/embj.2020104523. Epub 2020 Oct 19.

Authors

Affiliations

¹ Department of Systems Biology, Blavatnik Institute at Harvard Medical School, Boston, MA, USA.
² Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA.
³ Division of Biomedical Sciences, University of California Riverside School of Medicine, Riverside, CA, USA.
⁴ Department of Proteomics and Microbiology, Research Institute for Biosciences, University of Mons, Mons, Belgium.
⁵ Division of Biological and Environmental Sciences, Faculty of Natural Sciences, University of Stirling, Stirling, UK.

PMID: 33073387
PMCID: PMC7705453
DOI: 10.15252/embj.2020104523

Abstract

Oxidative stress alters cell viability, from microorganism irradiation sensitivity to human aging and neurodegeneration. Deleterious effects of protein carbonylation by reactive oxygen species (ROS) make understanding molecular properties determining ROS susceptibility essential. The radiation-resistant bacterium Deinococcus radiodurans accumulates less carbonylation than sensitive organisms, making it a key model for deciphering properties governing oxidative stress resistance. We integrated shotgun redox proteomics, structural systems biology, and machine learning to resolve properties determining protein damage by γ-irradiation in Escherichia coli and D. radiodurans at multiple scales. Local accessibility, charge, and lysine enrichment accurately predict ROS susceptibility. Lysine, methionine, and cysteine usage also contribute to ROS resistance of the D. radiodurans proteome. Our model predicts proteome maintenance machinery, and proteins protecting against ROS are more resistant in D. radiodurans. Our findings substantiate that protein-intrinsic protection impacts oxidative stress resistance, identifying causal molecular properties.

Keywords: Deinococcus radiodurans; oxidative stress; protein carbonyl; radioresistance; structural systems biology.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

**Figure 1. Study concept and workflow**
Relationship between carbonylation site distribution, protein vulnerability to reactive oxygen species, and stress phenotypes.
Structural systems biology workflow for proteome‐wide carbonyl site prediction. Red circles = carbonyl sites (CS); black circles = non‐oxidized RKPT residues; gray protein regions = non‐RKPT residues.

**Figure 2. Summary of shotgun redox proteomic data**
Total carbonyl‐bearing proteins detected by shotgun redox proteomic measurement in three biological replicates each of *E. coli* and *D. radiodurans* with and without irradiation. The left axis is the number of sequence‐unique proteins detected as carbonylated. The right axis is the number of sites in total detected as carbonylated (red) or not oxidized (black) in peptides bearing at least one carbonyl. Stripes indicate carbonylated proteins and carbonylatable sites detected only in irradiated samples. See also Appendix Fig S1.
Volcano plots for relative protein abundance changes measured by mass spectrometry in *E. coli* (left) and *D. radiodurans* (right) after irradiation using the same biological replicates as in Fig 2A. Black‐circled points are those proteins with significant changes (paired, 2‐sided t‐test P‐value < 0.05) of > 2‐fold or < 0.5‐fold. Red points are proteins with at least one carbonylated peptide detected. Fold change and P‐value cutoffs considered for significance are indicated by dashed lines. See also Fig EV1.

**Figure EV1. Survival and carbonyl site sampling limits for proteomic experiments, related to Figs 2 and 3**
Survival rates (based on CFU counts) of irradiated *E. coli* and *D. radiodurans* corresponding to biological triplicate samples from which proteomic data were acquired. Absolutely no colonies were recovered from *E. coli* cultures that had been irradiated, even without diluting the samples before plating.
Carbonyl site measurement saturation curves for biological triplicate shotgun redox proteomic measurements in *E. coli* and *D. radiodurans*. Exponential saturation functions were fit by minimizing the sum of squared errors with the triplicate data points; the bolded term in each function is the estimated number of total non‐redundant carbonyl sites in our samples.

**Figure 3. Amino acid prevalence in proteomic data before and after irradiation**
Prevalence of individual RKPT residues and prevalence of carbonylated form in experimentally measured peptides combining all three biological replicates of both conditions for each organism. Ratios are given above each pair of bars. All proportions are significantly different between each RKPT and their respective carbonylation state by two‐tailed z‐test of two proportions (P‐values < 0.01; see Materials and Methods), and meaning carbonylated proportions are not determined simply by relative prevalence of RKPT. See also Appendix Fig S1.
Prevalence of all canonical amino acids before irradiation of *E. coli* and *D. radiodurans*, combining all three biological replicates for each condition. Ratios are given above each pair of bars. All proportions are significantly different between species by two‐tailed z‐test of two proportions (P‐values < 0.01). See also Figs EV1 and EV2.

**Figure EV2. Canonical amino acid prevalence change following irradiation of *E. coli* (left) and *D. radiodurans* (right), related to Fig 3**
All values are based on the fold change of amino acid prevalence summing data across all three biological replicates under each treatment condition, unirradiated and irradiated. Shaded region indicates ±1 standard deviation from the mean across all amino acids. * = z‐test of two proportions P‐value < 0.01 (see Materials and Methods).

**Figure 4. Feature engineering**
Three‐dimensional feature engineering from molecular properties. Initial properties that can be determined only with an atomic resolution structure, in the context of an amino acid sequence, or that depend only on amino acid identity are denoted at left. This property list is a non‐redundant abbreviated set of all properties considered (see Appendix Table S4 and Materials and Methods for full detail). Columns of the feature matrix at right are alternating property sums and means at spatial scales denoted below matrix. *p = a* molecular property; *i = R*KPT residue; *k = n*eighbor residues of i; *r = r*adius length. See also Fig EV3.
Sequence homology‐based features for machine learning were derived by performing sequence alignments of all RKPT sites (± 10 residues) anchored at the central residue to compute alignment scores that were then reduced to a computationally manageable number of features by principal component analysis (PCA).

**Figure EV3. *D. radiodurans* proteome structure modeling, related to Figs 4, 5, 6**
Distribution of *D. radiodurans* proteins by difficulty of template‐based homology modeling and size regimes relevant for determining structure modeling algorithm applicability. Easy signifies ≥ 10 high‐confidence homologous templates available. Medium signifies ≥ 1 high‐confidence homologous template available. Hard signifies no high‐confidence homologous templates available. Proteins ≤ 200 residues long are amenable to *ab initio* folding. Proteins ≤ 800 residues long are amenable to homology modeling.
Structure quality evaluation criteria and percentage of *D. radiodurans* protein structures that satisfy published criteria thresholds. Blue plot represents best representative models for *D. radiodurans* proteins. Gray plot represents best available crystal structures from the PDB for *D. radiodurans* proteins.
Distribution of methods used to derive best representative protein structures for *D. radiodurans*. “None” indicates the proteins for which no PDB structure exists, and no modeling method is applicable.

**Figure 5. Multi‐scale validation of protein carbonylation predictor**
Residue‐scale validation: Receiver operating characteristic (ROC) curves for CS predictors derived by leave‐1-out validation. The dashed black line at y=x corresponds to performance expected by chance. Top left = final predictor trained by stacking structure‐ and sequence‐based models. Top middle = predictor trained only on structure‐based features. Top right = predictor trained only on sequence‐based features. Bottom left = theoretical maximum predictive power for a probability estimator (AUC = 0.98). Bottom middle = same algorithm as used for final predictor but with all features shuffled beforehand. Bottom right = CSPD model developed using metal‐catalyzed oxidation (MCO) site data from *E. coli*. See also Figs EV3 and EV4.
Protein‐scale validation: Comparison between predicted CS enrichment from leave‐1-out validation to CS enrichment computed from all carbonylated peptides measured for *E. coli* (left) and *D. radiodurans* (right). Each point represents a different protein species. Predicted probability‐weighted CS enrichment = (sum of carbonylation probabilities across training set sites)/(number of residues in corresponding peptides from experiments). Experimentally measured probability‐weighted CS enrichment = (sum of empirical oxidation probabilities across training set sites)/(number of residues in corresponding peptides from experiments). The solid line is the fitted regression line, and dashed lines indicate the boundaries of the 95% confidence interval.

**Figure 6. Molecular properties predicting protein vulnerability to carbonylation**
A–D
Example sites prone to carbonylation. (A) DRA0302_P252, (B) DR0099_P51, and (C) b0911_K411; and example robust site (D) b3313_P69.
Data information: All atoms of central RKPT side chains are shown, with carbonylatable atomic site in red (predicted and measured carbonylated) or black (predicted and measured not oxidized) and labeled with the 1‐letter code of the containing amino acid. Positive (blue) and negative (pink) charges within 8 Å are labeled. Carbonylatable lysine sites (purple) within 8 Å are labeled. Molecular surfaces within 5 Å of the central CS are dark gray. See also Fig EV3.

**Figure EV4. Residue‐scale validation of metal‐catalyzed oxidation (MCO) predictor, related to Fig 5A**
We developed an MCO predictor using the algorithm presented in this study but trained on previously published carbonylation by MCO data from *E. coli*, the same dataset used to develop CSPD. Receiver operating characteristic (ROC) curves for MCO site predictors derived by leave‐1‐out validation. The dashed black line at y=x corresponds to performance expected by chance. Top left = final predictor trained by stacking structure‐ and sequence‐based models. Top middle = predictor trained only on structure‐based features. Top right = predictor trained only on sequence‐based features. Bottom left = theoretical maximum predictive power for a probability estimator (AUC = 0.97). Bottom middle = same algorithm as used for final predictor but with all features shuffled beforehand. Bottom right = CSPD model developed using MCO site data from *E. coli*.

**Figure 7. Interspecies comparison of predicted protein vulnerability to carbonylation**
Each circle represents a distinct protein pair (ortholog or isozyme) between species. Each plot shows identical point values but highlights a different functional class of proteins with relevance to oxidative stress. The y = x diagonal line is a reference to compare intrinsic vulnerability to carbonylation between orthologs. The y = x/3.78 diagonal line is a reference to compare combined intrinsic and extrinsic carbonylation properties between orthologs. Elliptical dotted line encircles the points falling within 3 standard deviations of the mean coordinates and 3 standard deviations of the distance from reference line y=x, encompassing ~91% of all data points. This reference region distinguishes outlier points that are distant from the main population. Outliers with associated experimental evidence related to hypersensitivity to oxidative stress are labeled with their protein names. See also Appendix Fig S3 and Fig EV5.

**Figure EV5. Predicted outliers grouped by comparative intrinsic and extrinsic vulnerability to carbonylation in *D. radiodurans* and *E. coli*, related to Fig 7**
Group 1: Proteins predicted significantly more intrinsically prone to carbonylation than the rest of the proteome in both species (>3σ distance from mean and above a line perpendicular to y = x that is tangent to the ellipse upper vertex) but more intrinsically (left of y = x) and extrinsically more protected in *D. radiodurans* (orthogonal distance from y = x/3.78 is significantly greater than the mean; unpaired t‐test P‐value < 0.01). Group 2: Proteins predicted significantly more intrinsically prone to carbonylation than the rest of the proteome in both species, with similar intrinsic vulnerability between species (<3σ distance from y=x but not in Group 1) but more extrinsically protected in *D. radiodurans*. Group 3: Proteins predicted significantly more intrinsically (>3σ distance above y=x) or extrinsically protected against carbonylation in *D. radiodurans* but not in Groups 1 or 2. Group 4: Proteins predicted significantly more intrinsically robust to carbonylation than the rest of the proteome in both species (>3σ distance from mean and below a line perpendicular to y = x that is tangent to the ellipse lower vertex). Group 5: Proteins predicted significantly more intrinsically susceptible to carbonylation in *D. radiodurans* than in *E. coli* (>3σ distance below y = x) and not significantly more extrinsically protected than the rest of the proteome (orthogonal distance from y = x/3.78 is not significantly greater than the mean). Marginal: Proteins not predicted to have any significant intrinsic or extrinsic protection in *D. radiodurans* or *E. coli*.

See this image and copyright information in PMC

References

1. Airo A, Chan SL, Martinez Z, Platt MO, Trent JD (2004) Heat shock and cold shock in Deinococcus radiodurans . Cell Biochem Biophys 40: 277–288 - PubMed
1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403–410 - PubMed
1. Anaganti N, Basu B, Apte SK (2016) In situ real‐time evaluation of radiation‐responsive promoters in the extremely radioresistant microbe Deinococcus radiodurans . J Biosci 41: 193–203 - PubMed
1. Atchley WR, Zhao J, Fernandes AD, Druke T (2005) Solving the protein sequence metric problem. Proc Natl Acad Sci USA 102: 6395–6400 - PMC - PubMed
1. Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M et al (2008) The RAST Server: rapid annotations using subsystems technology. BMC Genom 9: 75 - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Protein structure, amino acid composition and sequence determine proteome vulnerability to oxidation-induced damage

Affiliations

Protein structure, amino acid composition and sequence determine proteome vulnerability to oxidation-induced damage

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources