Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 May;71(Pt 5):1147-58.
doi: 10.1107/S1399004715004241. Epub 2015 Apr 25.

Using support vector machines to improve elemental ion identification in macromolecular crystal structures

Affiliations

Using support vector machines to improve elemental ion identification in macromolecular crystal structures

Nader Morshed et al. Acta Crystallogr D Biol Crystallogr. 2015 May.

Abstract

In the process of macromolecular model building, crystallographers must examine electron density for isolated atoms and differentiate sites containing structured solvent molecules from those containing elemental ions. This task requires specific knowledge of metal-binding chemistry and scattering properties and is prone to error. A method has previously been described to identify ions based on manually chosen criteria for a number of elements. Here, the use of support vector machines (SVMs) to automatically classify isolated atoms as either solvent or one of various ions is described. Two data sets of protein crystal structures, one containing manually curated structures deposited with anomalous diffraction data and another with automatically filtered, high-resolution structures, were constructed. On the manually curated data set, an SVM classifier was able to distinguish calcium from manganese, zinc, iron and nickel, as well as all five of these ions from water molecules, with a high degree of accuracy. Additionally, SVMs trained on the automatically curated set of high-resolution structures were able to successfully classify most common elemental ions in an independent validation test set. This method is readily extensible to other elemental ions and can also be used in conjunction with previous methods based on a priori expectations of the chemical environment and X-ray scattering.

Keywords: elemental ion identification; model building; support vector machines.

PubMed Disclaimer

Figures

Figure 1
Figure 1
An illustration of the general design of support vector machines (SVMs). SVMs are initially trained by input of examples of each class and their associated feature values. A trained SVM is then able to predict the identity of future objects based on their feature values (left). The underlying mechanism of classification involves finding the set of hyperplanes that best divide the space between examples in N dimensions, where N is the number of values in a feature. Here, this is depicted as lines dividing two-dimensional space (right). Other types of SVMs allow nonlinear functions to divide this space.
Figure 2
Figure 2
Sites in the curated data set found to be incorrectly modeled as waters. (a) PDB entry 2oy2, chain A, residue 1290. (b) PDB entry 3bwx, chain A, residue 629. (c) PDB entry 4fca, chain A, residue 701. Green and red meshes are mF oDF c density at ±3.0σ. The pink mesh is anomalous difference density at 3.0σ. (a) and (b) include a gray mesh for the 2mF oDF c density at 2.0σ. Red spheres are water molecules. Distances are labeled in Å. Images were generated using PyMOL v.1.3.
Figure 3
Figure 3
Examples of the different categories of false positives, where a water molecule was incorrectly labeled as a heavy atom by the SVM. (a) Site had little or no electron density; PDB entry 2vca, chain A, residue 2257. (b) Site is coordinated extraordinarily closely by neighboring atoms; PDB entry 3qlq, chain A, residue 266. (c) Site is coordinating a neighboring heavy atom; PDB entry 3qlq, chain A, residue 259. (d) Site has an ambiguous environment and could not be successfully identified; PDB entry 2xrm, chain A, residue 2024. (a) includes a gray mesh for the 2mF oDF c density at 2.0σ. Colors, shapes and lines are as in Fig. 3 ▶.
Figure 4
Figure 4
Ion sites have a wide variety of chemical binding environments: BVS values (horizontal axis) plotted against VECSUM values (vertical axis) for ions in each of the high-resolution training sets after re-refinement. Points are colored by structure resolution, with the resolution range for each element indicated by the color bar to the right of the corresponding plot. Outliers with BVS values greater than 4 were omitted for display purposes.

References

    1. Adams, P. D. et al. (2010). Acta Cryst. D66, 213–221. - PubMed
    1. Afonine, P. V., Grosse-Kunstleve, R. W., Echols, N., Headd, J. J., Moriarty, N. W., Mustyakimov, M., Terwilliger, T. C., Urzhumtsev, A., Zwart, P. H. & Adams, P. D. (2012). Acta Cryst. D68, 352–367. - PMC - PubMed
    1. Ascone, I. & Strange, R. (2009). J. Synchrotron Rad. 16, 413–421. - PubMed
    1. Ben-Hur, A. & Weston, J. (2010). Methods Mol. Biol. 609, 223–239. - PubMed
    1. Bergmann, U. & Glatzel, P. (2009). Photosynth. Res. 102, 255–266. - PubMed

Publication types