Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Mar 8;102(10):3651-6.
doi: 10.1073/pnas.0409772102. Epub 2005 Feb 10.

Global mapping of the protein structure space and application in structure-based inference of protein function

Affiliations

Global mapping of the protein structure space and application in structure-based inference of protein function

Jingtong Hou et al. Proc Natl Acad Sci U S A. .

Abstract

We have constructed a map of the "protein structure space" by using the pairwise structural similarity scores calculated for all nonredundant protein structures determined experimentally. As expected, proteins with similar structures clustered together in the map and the overall distribution of structural classes of this map followed closely that of the map of the "protein fold space" we have reported previously. Consequently, proteins sharing similar molecular functions also were found to colocalize in the protein structure space map, pointing toward a previously undescribed scheme for structure-based functional inference for remote homologues based on the proximity in the map of the protein structure space. We found that this scheme consistently outperformed other predictions made by using either the raw scores or normalized Z-scores of pairwise DALI structure alignment.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Scree plot of the MDS results. A Scree plot evaluates the number of dimensions most appropriate to represent high-dimensional data in a low-dimensional space by means of MDS. To measure how fast normalized stress (NS) diminishes, an empirical parameter called the change rate (CR) is defined as CRk = (NSk - NSk-1)/(NSk+1 - NSk). The k that gives the largest CR indicates the optimal number of dimensions for data abstraction. Here, the largest CR occurs at k = 3. Therefore, the first three dimensions of the MDS projection are used to represent the protein structure space.
Fig. 2.
Fig. 2.
Two views of the map of the protein structure space. Each of the 1,898 protein chains is represented by a sphere in the 3D space. (A) α, β, and α/β classes of structures are distributed in three elongated regions centered around three axes, denoted here as the α, β, and α/β axes. The color descriptions and populations for each class category are listed in the lower right. (B) The protein structure space viewed from under the αβ plane. The members from small protein class are represented by green spheres. The intersection of α- and β-class axes is defined as the origin.
Fig. 3.
Fig. 3.
The top 10 most populated scop superfamilies. The names for superfamilies and their corresponding colors are indicated. Note that with the exception of P-loop-containing nucleoside triphosphate hydrolases, all superfamilies have their members clustered together. P-loop-containing proteins are more spread out because they are defined by a shared sequence motif rather than global structure similarity.
Fig. 4.
Fig. 4.
Performance of structure-based function inference. (A) ROC plot of the performance of function inference. TP, true positives; FP, false positives; TN, true negatives; FN, false negatives. The green curve denotes ROC curve of the SSM distance-based function inference. The 1:1 line (black),dali Z-score curve (blue), and blast E-value curve (brown) are close to each other in the x-axis range of 0.2-0.9. Red, dali similarity score. (B and C) Relative performance of functional inference methods. Each graph plots the total number of GO function families for which a given method exceeds a cutoff of ROC (B) or mRFP (C) value (32). Large ROC scores and small mRFP scores indicate better performances of an inference method. (D) GO-family-specific performance of the SSM distance-based functional inference and dali similarity score-based functional inference. The green and red asterisks denote families for which the SSM distances and dali similarity scores performed better, respectively. The number to the right of each asterisk indicates the GO family number as listed in Table 1. The 20th family (ROC value 0.84 for map distance and 0.65 for dali similarity score) is not shown in the plot for presentation purposes.
Fig. 5.
Fig. 5.
Alignment of two structurally dissimilar but functionally similar proteins within the oxidoreductase GO-function family. The SSM distance-based function inference successfully placed this pair among the top 5% of all 1,898 × 1,898 pairs.

Comment in

  • A glimpse at the organization of the protein universe.
    Vendruscolo M, Dobson CM. Vendruscolo M, et al. Proc Natl Acad Sci U S A. 2005 Apr 19;102(16):5641-2. doi: 10.1073/pnas.0500274102. Epub 2005 Apr 12. Proc Natl Acad Sci U S A. 2005. PMID: 15827120 Free PMC article. No abstract available.

Similar articles

Cited by

References

    1. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D. J. (1997) Nucleic Acids Res. 25, 3389-3402. - PMC - PubMed
    1. Bucher, P. & Bairoch, A. (1994) Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, eds. Altman, R., Brutlag, D., Karp, P., Lathrop, R. & Searls, D. (AAAI Press, Menlo Park, CA), pp. 53-61.
    1. Hulo, N., Sigrist, C. J., Le Saux, V., Langendijk-Genevaux, P. S., Bordoli, L., Gattiker, A., De Castro, E., Bucher, P. & Bairoch, A. (2004) Nucleic Acids Res. 32, D134-D137. - PMC - PubMed
    1. Sonnhammer, E. L., Eddy, S. R. & Durbin, R. (1997) Proteins 28, 405-420. - PubMed
    1. Sonnhammer, E. L., Eddy, S. R., Birney, E., Bateman, A. & Durbin, R. (1998) Nucleic Acids Res. 26, 320-322. - PMC - PubMed

Publication types

MeSH terms