Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Oct;1854(10 Pt A):1545-52.
doi: 10.1016/j.bbapap.2015.02.016. Epub 2015 Mar 7.

Application of data mining tools for classification of protein structural class from residue based averaged NMR chemical shifts

Affiliations

Application of data mining tools for classification of protein structural class from residue based averaged NMR chemical shifts

Arun V Kumar et al. Biochim Biophys Acta. 2015 Oct.

Abstract

The number of protein sequences deriving from genome sequencing projects is outpacing our knowledge about the function of these proteins. With the gap between experimentally characterized and uncharacterized proteins continuing to widen, it is necessary to develop new computational methods and tools for protein structural information that is directly related to function. Nuclear magnetic resonance (NMR) provides powerful means to determine three-dimensional structures of proteins in the solution state. However, translation of the NMR spectral parameters to even low-resolution structural information such as protein class requires multiple time consuming steps. In this paper, we present an unorthodox method to predict the protein structural class directly by using the residue's averaged chemical shifts (ACS) based on machine learning algorithms. Experimental chemical shift information from 1491 proteins obtained from Biological Magnetic Resonance Bank (BMRB) and their respective protein structural classes derived from structural classification of proteins (SCOP) were used to construct a data set with 119 attributes and 5 different classes. Twenty four different classification schemes were evaluated using several performance measures. Overall the residue based ACS values can predict the protein structural classes with 80% accuracy measured by Matthew correlation coefficient. Specifically protein classes defined by mixed αβ or small proteins are classified with >90% correlation. Our results indicate that this NMR-based method can be utilized as a low-resolution tool for protein structural class identification without any prior chemical shift assignments.

Keywords: Chemical shift; Data mining; NMR; Protein structural class.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the data mining approach to predict protein structural class from the residue based averaged chemical shift values (attributes). DM stands of data mining.
Figure 2
Figure 2
Hierarchical clustering of 13Cα and 1Hα residue based ACS values. Natural grouping of the amino acids residues (top) with respect to the experimentally determined residue based ACS values (along the side). Euclidian distance metric was used for clustering. The clustering processes generated natural groupings proteins (secondary structure content) in the form of dendrograms (left color coded) and the respective contributions of the amino acids that constitute the proteins (top dendrograms color coded) with the total changes presented as heat maps. Intensity scales are shown in arbitrary units shown by the scale at the bottom of the panel ranging from green to red in a relative scale. Names of amino acid residues above each heat map represent various combinations and those on the right (color coded) represent either predominantly α helical or β-sheet. The amino acids are grouped into two major clusters based on the profiles. All α and β proteins groups separately as marked for both the 13Cα and 1Hα nuclei.
Figure 3
Figure 3
Residue based ACS distribution of the heteronuclear pair 13Cα and 1Hα for each residue type. Experimental (left) and Calculated ACS (right) values for each residue noted by three letter amino acid code on each frame. Chemical shift scaling for all the frames is same except for the Gly residues along the 13Cα axis.
Figure 4
Figure 4
Residue based ACS distribution of the heteronuclear pair 15N and 1HN for each residue type. Experimental (left) and Calculated ACS (right) values for each residue noted by three letter amino acid code on each frame. The α and b class proteins are differentiated by black and red symbols, respectively.
Figure 5
Figure 5
Comparison of the experimental (along the Y-axis) and calculated (along X-axis) residue based chemical shifts for helical (top row) and strand conformations (bottom row). Left, middle and the right row represent correspond to 1Hα, 13Cα and 13C (carbonyl) nuclei. Each residue is identified by different symbol as noted on the right side of the plot.

References

    1. Pauling L, Corey RB. The pleated sheet, a new layer configuration of polypeptide chains. Proc Natl Acad Sci U S A. 1951;37:251–256. - PMC - PubMed
    1. Pauling L, Corey RB, Branson HR. The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain. Proc Natl Acad Sci U S A. 1951;37:205–211. - PMC - PubMed
    1. Bowie JU, Luthy R, Eisenberg D. A method to identify protein sequences that fold into a known three-dimensional structure. Science. 1991;253:164–170. - PubMed
    1. Chakrabarti P, Pal D. The interrelationships of side-chain and main-chain conformations in proteins. Prog Biophys Mol Biol. 2001;76:1–102. - PubMed
    1. Chen CC, Singh JP, Altman RB. Using imperfect secondary structure predictions to improve molecular structure computations. Bioinformatics. 1999;15:53–65. - PubMed

Publication types

LinkOut - more resources