Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Apr 19:6:12.
doi: 10.1186/1748-7188-6-12.

Sparse estimation for structural variability

Affiliations

Sparse estimation for structural variability

Raghavendra Hosur et al. Algorithms Mol Biol. .

Abstract

Background: Proteins are dynamic molecules that exhibit a wide range of motions; often these conformational changes are important for protein function. Determining biologically relevant conformational changes, or true variability, efficiently is challenging due to the noise present in structure data.

Results: In this paper we present a novel approach to elucidate conformational variability in structures solved using X-ray crystallography. We first infer an ensemble to represent the experimental data and then formulate the identification of truly variable members of the ensemble (as opposed to those that vary only due to noise) as a sparse estimation problem. Our results indicate that the algorithm is able to accurately distinguish genuine conformational changes from variability due to noise. We validate our predictions for structures in the Protein Data Bank by comparing with NMR experiments, as well as on synthetic data. In addition to improved performance over existing methods, the algorithm is robust to the levels of noise present in real data. In the case of Human Ubiquitin-conjugating enzyme Ubc9, variability identified by the algorithm corresponds to functionally important residues implicated by mutagenesis experiments. Our algorithm is also general enough to be integrated into state-of-the-art software tools for structure-inference.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the ensemble generation and classification algorithm. Roughly 2000 conformations are constructed using ChainTweak and side-chains are added using RAPPER. The optimization is carried out until the fit-to-data converges (Rfree). In the final step, structures that collectively represent the data as well as the PDB structure are selected for classification via Lasso. EDM: Electron density map. R is a measure of agreement between the amplitudes of the structure factors calculated from a structure and those from the original diffraction data. Rfree is the corresponding cross-validation parameter, calculated on diffraction data not used in the structure optimization process [20].
Figure 2
Figure 2
Example of ensemble construction and classification. A) PDB structure is shown in green, the second conformer in the synthetic crystal is in gray. The two structures classified by Lasso as variable are shown in blue and the two as variable due to noise, in red. B) Summary of the algorithm output using synthetic data. RMSD is calculated with respect to the PDB structure (green). Suitability of the linear model and statistical significance of the regression coefficients were evaluated using standard techniques (R2 and t-test).
Figure 3
Figure 3
Performance analysis. A) Regularization path for the ensemble (|ω|1 → 0 as λ → ∞ towards left). B) Residue-level lasso with varying window sizes centered on each residue (λ = 10). The color code is the same as in Fig 2.
Figure 4
Figure 4
Comparative analysis. A) Pflex is sensitive to the parameter σ, producing false-positives at low values and false-negatives at higher values (inset). B) Average B-factors correctly identify the regions of variability, but cannot distinguish between true variability and variability due to noise. C) Choosing a RMSD cutoff for classification is difficult with noisy coordinates. The color code is the one used in Figures 2 and 3.
Figure 5
Figure 5
Interpretation of ensembles on real data. A) Lasso tests on 9ilb:124-132 classifies 2 structures as non-variable (pink, yellow). B) For the same loop, structures classified as truly variable (red, cyan, green) deviate more from the PDB structure (black). C) Trajectory of the solution can give qualitative knowledge of the landscape in the vicinity of the native structure. All density maps are contoured at 1.5σ for clarity. Figures were generated using PyMol [36].
Figure 6
Figure 6
Flexibility analysis of the 1a3s ensemble. A) Residue level Lasso with a window size of 5 reveals four fragments (peaks) of potential interest: 6-15, 30-40, 115-120 and135-142. B) The N-terminal region (12-20) of 1a3s. Multiple rotamers of R13 (left, red) might affect the interaction surface consisting of R18 (red), K14 and K18 (yellow), thus influencing Ubc9's N-terminus specificity. C) Variability around the catalytic site Cys93 (yellow). Residues Gln126 (brown) and Asp127 (green) have been identified through mutagenesis experiments as critical for Ubc9's interaction with a substrate. The black structure represents PDB coordinates.

Similar articles

References

    1. Bourne P, Weissig H. Structural Bioinformatics. Wiley-Liss, Inc., NJ; 2003.
    1. Jensen L. Refinement and reliability of macromolecular models based on X-ray diffraction data. Methods in Enzymology. 1997;277:353–366. - PubMed
    1. Ringe G, Petsko G. Study of protein dynamics by X-ray diffraction. Methods in Enzymology. 1986;131:389–433. - PubMed
    1. Volkman B, Lipson D, Wemmer D, Kern D. Two state allosteric behaviour in a single domain signalling protein. Science. 2001;291:2429–2433. doi: 10.1126/science.291.5512.2429. - DOI - PubMed
    1. Eissenmesser E, Millet O, Labeikovsky W, Korzhnev D, Wolf-Watz M, Bosco D, Skalicky J, Kay L, Kern D. Intrinsic dynamics of an enzyme underlies catalysis. Nature. 2005;438:117–121. doi: 10.1038/nature04105. - DOI - PubMed

LinkOut - more resources