Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jul;171(1):64-73.
doi: 10.1016/j.jsb.2010.03.016. Epub 2010 Mar 27.

Prediction of protein crystallization outcome using a hybrid method

Affiliations

Prediction of protein crystallization outcome using a hybrid method

Frank H Zucker et al. J Struct Biol. 2010 Jul.

Abstract

The great power of protein crystallography to reveal biological structure is often limited by the tremendous effort required to produce suitable crystals. A hybrid crystal growth predictive model is presented that combines both experimental and sequence-derived data from target proteins, including novel variables derived from physico-chemical characterization such as R(30), the ratio between a protein's DSF intensity at 30°C and at T(m). This hybrid model is shown to be more powerful than sequence-based prediction alone - and more likely to be useful for prioritizing and directing the efforts of structural genomics and individual structural biology laboratories.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Analysis of size exclusion chromatography profiles
Gaussian peaks fit to the SEC curve for E. histolytica aspartate-tRNA ligase batch 24058. In (a), (b) and (c) open black circles are observed absorbance at 280 nm in milli-absorbance units (mAu); vertical dashes bound the fractions pooled for further characterization and crystallization; red line is calculated mAu using a linear background plus 1, 2 or 3 Gaussian curves fit to the observed mAu using gnuplot. In (b) and (c) dotted lines in blue, green and violet show individual Gaussians. (A 4th Gaussian, not shown, can be fit as another small curve under the main peak.) (d) Residuals and calculated pool purity for fitting 1 to 4 Gaussians to observed mAu. Left axis: solid black circles, total Rabs, the absolute value of the difference between observed and calculated mAu divided by the total observed mAu; magenta squares, Rabs for the pooled fractions; green triangles, root mean square of the residuals as a fraction of the mean. Right axis: red diamonds, purity of the pooled fractions i.e. the maximum area under a single Gaussian in the pooled fractions divided by the total pool area. SECR1 is Rabs for one Gaussian: i.e. the area between the red and black curves in (a) over the area under the black curve. For this sample SECR1 = 0.16. SECPP is the purity of the pooled fractions calculated in the optimal model. For this sample SECPP=0.99 from (b). (Figures prepared in the R statistical environment)
Figure 2
Figure 2. Analysis of differential scanning fluorimetry curves
Four protein samples illustrate different curve shapes. Black solid lines : fluorescence intensity of SYPRO Orange dye vs. temperature, smoothed over 15 points (3 °C) and normalized to the minimum and maximum observed intensities. Blue dashed vertical lines: Tm, the temperature with the steepest positive slope, (dI/dT)max. Blue horizontal dashes: ITm, the intensity at Tm. (a) L. guyanensis 6-phosphogluconolactonase with ideal shape: low intensity at low temperature and a single transition. Blue horizontal arrow: temperature range over which the slope is at least ½ of (dI/dT)max i.e. full width at half maximum (FWHM) of the derivative, proportional to the melting transition width Tw. (b) E. histolytica aspartate-tRNA ligase batch 21516 with high intensity at low temperature and a single transition. Red horizontal dashes: I30, intensity at 30 °C. R30 is the ratio of I30 to ITm. Green dot-dash line: I30 threshold based on the R30 criterion in the decision tree, Figure 3b, i.e. I30/ITm=0.105. (c) T. gondii porphobilinogen synthase amino acids 320–658, with two distinct transitions. Magenta dotted line: sigmoid curve fit to observed intensity at Tm and at 2∙Tw below Tm. At low temperatures this curve approaches Imin, the estimated starting intensity of the major transition. Since in many cases intensity decays above Tm, and in others a minor transition is seen above Tm, the amplitude of the major transition is estimated as twice the intensity change between Imin and ITm. When there is a minor transition below Tm as in this case, Imin is also used as an estimate of the amplitude of that minor transition. RMT, the transition fraction, is calculated as the amplitude of the minor transition(s) over the total amplitude of all transitions. (d) L. major methionyl-tRNA synthetase, amino acids 206 to 747, with high R30 and high RMT. Both I30, red dashes, and Imin from the curve fit to the transition, magenta dots, are near ITm, blue dashes. (Figures prepared in Excel.)
Figure 3
Figure 3. Development of diffraction predictor using experimental results and sequence
(a) Predictive model design. Top: train the model on experimental and sequence data and known crystallization outcomes quantified as diffraction scores (DS). Bottom: use the model to predict DS for new samples from new experimental and sequence data. (b) Hybrid crystal growth predictor (HyXG-1) decision tree prediction trained on 77 samples: start with experimental and sequence data for a new protein sample (top left); travel to the right across the tree branching according to criteria shown; arrive at the predicted DS for each category (center). Predicted DS is the mean DS for all training samples in that category; from top to bottom, there were 9, 7, 10, 14, 7, 12 and 18 training samples in each category. To the right are the percent of all test and training samples in each category diffracting to at least 10 Å or at least 2.8 Å, and suggestions for actions if no crystals are seen in initial trials. Possible changes include: change construct tag, tag placement or promoter; change expression host, scale-up volume, aeration method, or time and temperature regime; change purification columns (e.g. add ion exchange), tag cleavage, lysis and column buffers, or final concentration step.
Figure 3
Figure 3. Development of diffraction predictor using experimental results and sequence
(a) Predictive model design. Top: train the model on experimental and sequence data and known crystallization outcomes quantified as diffraction scores (DS). Bottom: use the model to predict DS for new samples from new experimental and sequence data. (b) Hybrid crystal growth predictor (HyXG-1) decision tree prediction trained on 77 samples: start with experimental and sequence data for a new protein sample (top left); travel to the right across the tree branching according to criteria shown; arrive at the predicted DS for each category (center). Predicted DS is the mean DS for all training samples in that category; from top to bottom, there were 9, 7, 10, 14, 7, 12 and 18 training samples in each category. To the right are the percent of all test and training samples in each category diffracting to at least 10 Å or at least 2.8 Å, and suggestions for actions if no crystals are seen in initial trials. Possible changes include: change construct tag, tag placement or promoter; change expression host, scale-up volume, aeration method, or time and temperature regime; change purification columns (e.g. add ion exchange), tag cleavage, lysis and column buffers, or final concentration step.
Figure 4
Figure 4. Diffraction score predictions using experimental results and sequence
(a) DS observed vs. DS predicted by the HyXG-1 model shown in (3b) for the test set of 30 new samples. DS is:
  1. 0, no mountable protein crystals after extensive crystal screening;

  2. 1, no diffraction;

  3. 2, diffraction worse than 10 Å;

  4. 3, 10 to 4.01 Å diffraction;

  5. 4, 4.80 to 2.81 Å diffraction;

  6. 5, 2.80 to 2.01 Å diffraction;

  7. 6, 2.00 Å or better diffraction.

Bars: ±1 standard deviation based on the deviation of training DS. Dotted lines and coloring based on success threshold of better than 10 Å (DS>3). (b) Receiver operating characteristic (ROC) curves: area under curve is a measure of predictive power. Blue lines, predictions from combined experimental and sequence data (Table 2, row A); red, predictions leaving out experimental data (row C). Dashes, ROC curve for success threshold of better than 10 Å (DS>3); solid, success threshold of 2.8 Å or better (DS≥5). Shading added to visually clarify the association of lines.

References

    1. Chayen NE, Saridakis E. Protein crystallization: from purified protein to diffraction-quality crystal. Nat Methods. 2008 Feb;vol. 5:147–153. - PubMed
    1. Rupp B, Wang J. Predictive models for protein crystallization. Methods. 2004 Nov;vol. 34:390–407. - PubMed
    1. Ericsson UB, Hallberg BM, Detitta GT, Dekker N, Nordlund P. Thermofluor-based high-throughput stability optimization of proteins for structural studies. Anal Biochem. 2006 Oct 15;vol. 357:289–298. - PubMed
    1. D'Arcy A. Crystallizing proteins - a rational approach? Acta Crystallogr D Biol Crystallogr. 1994 Jul 1;vol. 50:469–471. - PubMed
    1. Gao X, Bain K, Bonanno JB, Buchanan M, Henderson D, Lorimer D, Marsh C, Reynes JA, Sauder JM, Schwinn K, Thai C, Burley SK. High-throughput limited proteolysis/mass spectrometry for protein domain elucidation. J Struct Funct Genomics. 2005;vol. 6:129–134. - PubMed

Publication types

LinkOut - more resources