Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 1;39(23):e104523.
doi: 10.15252/embj.2020104523. Epub 2020 Oct 19.

Protein structure, amino acid composition and sequence determine proteome vulnerability to oxidation-induced damage

Affiliations

Protein structure, amino acid composition and sequence determine proteome vulnerability to oxidation-induced damage

Roger L Chang et al. EMBO J. .

Abstract

Oxidative stress alters cell viability, from microorganism irradiation sensitivity to human aging and neurodegeneration. Deleterious effects of protein carbonylation by reactive oxygen species (ROS) make understanding molecular properties determining ROS susceptibility essential. The radiation-resistant bacterium Deinococcus radiodurans accumulates less carbonylation than sensitive organisms, making it a key model for deciphering properties governing oxidative stress resistance. We integrated shotgun redox proteomics, structural systems biology, and machine learning to resolve properties determining protein damage by γ-irradiation in Escherichia coli and D. radiodurans at multiple scales. Local accessibility, charge, and lysine enrichment accurately predict ROS susceptibility. Lysine, methionine, and cysteine usage also contribute to ROS resistance of the D. radiodurans proteome. Our model predicts proteome maintenance machinery, and proteins protecting against ROS are more resistant in D. radiodurans. Our findings substantiate that protein-intrinsic protection impacts oxidative stress resistance, identifying causal molecular properties.

Keywords: Deinococcus radiodurans; oxidative stress; protein carbonyl; radioresistance; structural systems biology.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

Figure 1
Figure 1. Study concept and workflow
  1. Relationship between carbonylation site distribution, protein vulnerability to reactive oxygen species, and stress phenotypes.

  2. Structural systems biology workflow for proteome‐wide carbonyl site prediction. Red circles = carbonyl sites (CS); black circles = non‐oxidized RKPT residues; gray protein regions = non‐RKPT residues.

Figure 2
Figure 2. Summary of shotgun redox proteomic data
  1. Total carbonyl‐bearing proteins detected by shotgun redox proteomic measurement in three biological replicates each of E. coli and D. radiodurans with and without irradiation. The left axis is the number of sequence‐unique proteins detected as carbonylated. The right axis is the number of sites in total detected as carbonylated (red) or not oxidized (black) in peptides bearing at least one carbonyl. Stripes indicate carbonylated proteins and carbonylatable sites detected only in irradiated samples. See also Appendix Fig S1.

  2. Volcano plots for relative protein abundance changes measured by mass spectrometry in E. coli (left) and D. radiodurans (right) after irradiation using the same biological replicates as in Fig 2A. Black‐circled points are those proteins with significant changes (paired, 2‐sided t‐test P‐value < 0.05) of > 2‐fold or < 0.5‐fold. Red points are proteins with at least one carbonylated peptide detected. Fold change and P‐value cutoffs considered for significance are indicated by dashed lines. See also Fig EV1.

Figure EV1
Figure EV1. Survival and carbonyl site sampling limits for proteomic experiments, related to Figs 2 and 3
  1. Survival rates (based on CFU counts) of irradiated E. coli and D. radiodurans corresponding to biological triplicate samples from which proteomic data were acquired. Absolutely no colonies were recovered from E. coli cultures that had been irradiated, even without diluting the samples before plating.

  2. Carbonyl site measurement saturation curves for biological triplicate shotgun redox proteomic measurements in E. coli and D. radiodurans. Exponential saturation functions were fit by minimizing the sum of squared errors with the triplicate data points; the bolded term in each function is the estimated number of total non‐redundant carbonyl sites in our samples.

Figure 3
Figure 3. Amino acid prevalence in proteomic data before and after irradiation
  1. Prevalence of individual RKPT residues and prevalence of carbonylated form in experimentally measured peptides combining all three biological replicates of both conditions for each organism. Ratios are given above each pair of bars. All proportions are significantly different between each RKPT and their respective carbonylation state by two‐tailed z‐test of two proportions (P‐values < 0.01; see Materials and Methods), and meaning carbonylated proportions are not determined simply by relative prevalence of RKPT. See also Appendix Fig S1.

  2. Prevalence of all canonical amino acids before irradiation of E. coli and D. radiodurans, combining all three biological replicates for each condition. Ratios are given above each pair of bars. All proportions are significantly different between species by two‐tailed z‐test of two proportions (P‐values < 0.01). See also Figs EV1 and EV2.

Figure EV2
Figure EV2. Canonical amino acid prevalence change following irradiation of E. coli (left) and D. radiodurans (right), related to Fig 3
All values are based on the fold change of amino acid prevalence summing data across all three biological replicates under each treatment condition, unirradiated and irradiated. Shaded region indicates ±1 standard deviation from the mean across all amino acids. * = z‐test of two proportions P‐value < 0.01 (see Materials and Methods).
Figure 4
Figure 4. Feature engineering
  1. Three‐dimensional feature engineering from molecular properties. Initial properties that can be determined only with an atomic resolution structure, in the context of an amino acid sequence, or that depend only on amino acid identity are denoted at left. This property list is a non‐redundant abbreviated set of all properties considered (see Appendix Table S4 and Materials and Methods for full detail). Columns of the feature matrix at right are alternating property sums and means at spatial scales denoted below matrix. p = a molecular property; i = RKPT residue; k = neighbor residues of ir = radius length. See also Fig EV3.

  2. Sequence homology‐based features for machine learning were derived by performing sequence alignments of all RKPT sites (± 10 residues) anchored at the central residue to compute alignment scores that were then reduced to a computationally manageable number of features by principal component analysis (PCA).

Figure EV3
Figure EV3. D. radiodurans proteome structure modeling, related to Figs 4, 5, 6
  1. Distribution of D. radiodurans proteins by difficulty of template‐based homology modeling and size regimes relevant for determining structure modeling algorithm applicability. Easy signifies ≥ 10 high‐confidence homologous templates available. Medium signifies ≥ 1 high‐confidence homologous template available. Hard signifies no high‐confidence homologous templates available. Proteins ≤ 200 residues long are amenable to ab initio folding. Proteins ≤ 800 residues long are amenable to homology modeling.

  2. Structure quality evaluation criteria and percentage of D. radiodurans protein structures that satisfy published criteria thresholds. Blue plot represents best representative models for D. radiodurans proteins. Gray plot represents best available crystal structures from the PDB for D. radiodurans proteins.

  3. Distribution of methods used to derive best representative protein structures for D. radiodurans. “None” indicates the proteins for which no PDB structure exists, and no modeling method is applicable.

Figure 5
Figure 5. Multi‐scale validation of protein carbonylation predictor
  1. Residue‐scale validation: Receiver operating characteristic (ROC) curves for CS predictors derived by leave‐1-out validation. The dashed black line at y=x corresponds to performance expected by chance. Top left = final predictor trained by stacking structure‐ and sequence‐based models. Top middle = predictor trained only on structure‐based features. Top right = predictor trained only on sequence‐based features. Bottom left = theoretical maximum predictive power for a probability estimator (AUC = 0.98). Bottom middle = same algorithm as used for final predictor but with all features shuffled beforehand. Bottom right = CSPD model developed using metal‐catalyzed oxidation (MCO) site data from E. coli. See also Figs EV3 and EV4.

  2. Protein‐scale validation: Comparison between predicted CS enrichment from leave‐1-out validation to CS enrichment computed from all carbonylated peptides measured for E. coli (left) and D. radiodurans (right). Each point represents a different protein species. Predicted probability‐weighted CS enrichment = (sum of carbonylation probabilities across training set sites)/(number of residues in corresponding peptides from experiments). Experimentally measured probability‐weighted CS enrichment = (sum of empirical oxidation probabilities across training set sites)/(number of residues in corresponding peptides from experiments). The solid line is the fitted regression line, and dashed lines indicate the boundaries of the 95% confidence interval.

Figure 6
Figure 6. Molecular properties predicting protein vulnerability to carbonylation
  1. A–D

    Example sites prone to carbonylation. (A) DRA0302_P252, (B) DR0099_P51, and (C) b0911_K411; and example robust site (D) b3313_P69.

Data information: All atoms of central RKPT side chains are shown, with carbonylatable atomic site in red (predicted and measured carbonylated) or black (predicted and measured not oxidized) and labeled with the 1‐letter code of the containing amino acid. Positive (blue) and negative (pink) charges within 8 Å are labeled. Carbonylatable lysine sites (purple) within 8 Å are labeled. Molecular surfaces within 5 Å of the central CS are dark gray. See also Fig EV3.
Figure EV4
Figure EV4. Residue‐scale validation of metal‐catalyzed oxidation (MCO) predictor, related to Fig 5A
We developed an MCO predictor using the algorithm presented in this study but trained on previously published carbonylation by MCO data from E. coli, the same dataset used to develop CSPD. Receiver operating characteristic (ROC) curves for MCO site predictors derived by leave‐1‐out validation. The dashed black line at y=x corresponds to performance expected by chance. Top left = final predictor trained by stacking structure‐ and sequence‐based models. Top middle = predictor trained only on structure‐based features. Top right = predictor trained only on sequence‐based features. Bottom left = theoretical maximum predictive power for a probability estimator (AUC = 0.97). Bottom middle = same algorithm as used for final predictor but with all features shuffled beforehand. Bottom right = CSPD model developed using MCO site data from E. coli.
Figure 7
Figure 7. Interspecies comparison of predicted protein vulnerability to carbonylation
Each circle represents a distinct protein pair (ortholog or isozyme) between species. Each plot shows identical point values but highlights a different functional class of proteins with relevance to oxidative stress. The y = x diagonal line is a reference to compare intrinsic vulnerability to carbonylation between orthologs. The y = x/3.78 diagonal line is a reference to compare combined intrinsic and extrinsic carbonylation properties between orthologs. Elliptical dotted line encircles the points falling within 3 standard deviations of the mean coordinates and 3 standard deviations of the distance from reference line y=x, encompassing ~91% of all data points. This reference region distinguishes outlier points that are distant from the main population. Outliers with associated experimental evidence related to hypersensitivity to oxidative stress are labeled with their protein names. See also Appendix Fig S3 and Fig EV5.
Figure EV5
Figure EV5. Predicted outliers grouped by comparative intrinsic and extrinsic vulnerability to carbonylation in D. radiodurans and E. coli, related to Fig 7
Group 1: Proteins predicted significantly more intrinsically prone to carbonylation than the rest of the proteome in both species (>3σ distance from mean and above a line perpendicular to y = x that is tangent to the ellipse upper vertex) but more intrinsically (left of y = x) and extrinsically more protected in D. radiodurans (orthogonal distance from y = x/3.78 is significantly greater than the mean; unpaired t‐test P‐value < 0.01). Group 2: Proteins predicted significantly more intrinsically prone to carbonylation than the rest of the proteome in both species, with similar intrinsic vulnerability between species (<3σ distance from y=x but not in Group 1) but more extrinsically protected in D. radiodurans. Group 3: Proteins predicted significantly more intrinsically (>3σ distance above y=x) or extrinsically protected against carbonylation in D. radiodurans but not in Groups 1 or 2. Group 4: Proteins predicted significantly more intrinsically robust to carbonylation than the rest of the proteome in both species (>3σ distance from mean and below a line perpendicular to y = x that is tangent to the ellipse lower vertex). Group 5: Proteins predicted significantly more intrinsically susceptible to carbonylation in D. radiodurans than in E. coli (>3σ distance below y = x) and not significantly more extrinsically protected than the rest of the proteome (orthogonal distance from y = x/3.78 is not significantly greater than the mean). Marginal: Proteins not predicted to have any significant intrinsic or extrinsic protection in D. radiodurans or E. coli.

References

    1. Airo A, Chan SL, Martinez Z, Platt MO, Trent JD (2004) Heat shock and cold shock in Deinococcus radiodurans . Cell Biochem Biophys 40: 277–288 - PubMed
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403–410 - PubMed
    1. Anaganti N, Basu B, Apte SK (2016) In situ real‐time evaluation of radiation‐responsive promoters in the extremely radioresistant microbe Deinococcus radiodurans . J Biosci 41: 193–203 - PubMed
    1. Atchley WR, Zhao J, Fernandes AD, Druke T (2005) Solving the protein sequence metric problem. Proc Natl Acad Sci USA 102: 6395–6400 - PMC - PubMed
    1. Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M et al (2008) The RAST Server: rapid annotations using subsystems technology. BMC Genom 9: 75 - PMC - PubMed

Publication types