Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 May 20:9:34.
doi: 10.1186/1472-6807-9-34.

Universal partitioning of the hierarchical fold network of 50-residue segments in proteins

Affiliations

Universal partitioning of the hierarchical fold network of 50-residue segments in proteins

Jun-ichi Ito et al. BMC Struct Biol. .

Abstract

Background: Several studies have demonstrated that protein fold space is structured hierarchically and that power-law statistics are satisfied in relation between the numbers of protein families and protein folds (or superfamilies). We examined the internal structure and statistics in the fold space of 50 amino-acid residue segments taken from various protein folds. We used inter-residue contact patterns to measure the tertiary structural similarity among segments. Using this similarity measure, the segments were classified into a number (Kc) of clusters. We examined various Kc values for the clustering. The special resolution to differentiate the segment tertiary structures increases with increasing Kc. Furthermore, we constructed networks by linking structurally similar clusters.

Results: The network was partitioned persistently into four regions for Kc >or= 1000. This main partitioning is consistent with results of earlier studies, where similar partitioning was reported in classifying protein domain structures. Furthermore, the network was partitioned naturally into several dozens of sub-networks (i.e., communities). Therefore, intra-sub-network clusters were mutually connected with numerous links, although inter-sub-network ones were rarely done with few links. For Kc >or= 1000, the major sub-networks were about 40; the contents of the major sub-networks were conserved. This sub-partitioning is a novel finding, suggesting that the network is structured hierarchically: Segments construct a cluster, clusters form a sub-network, and sub-networks constitute a region. Additionally, the network was characterized by non-power-law statistics, which is also a novel finding.

Conclusion: Main findings are: (1) The universe of 50 residue segments found here was characterized by non-power-law statistics. Therefore, the universe differs from those ever reported for the protein domains. (2) The 50-residue segments were partitioned persistently and universally into some dozens (ca. 40) of major sub-networks, irrespective of the number of clusters. (3) These major sub-networks encompassed 90% of all segments. Consequently, the protein tertiary structure is constructed using the dozens of elements (sub-networks).

PubMed Disclaimer

Figures

Figure 1
Figure 1
<S > and <O > as a function of Kc. (A) <S > is the average cluster size (Eq. 3). The error bar shows the standard deviation over clusters. (B) <O > is the average number of segments supplied by a protein to a cluster (see the text for a detailed definition of <O >).
Figure 2
Figure 2
Number nu of segments in a cluster as a function of the ordinal number of the cluster.
Figure 3
Figure 3
Averaged correlation coefficient <f >Kc (Eq. 9) for intra-cluster segments as a function of Kc.
Figure 4
Figure 4
Networked 3D distribution of clusters for Kc = 1000 (A), 2000 (B), and 3000 (C). In this figure, a sphere represents a cluster. The larger the sphere, the more segments the cluster involves. The coloring method for clusters and inter-cluster links is explained briefly below (see Additional file 1 for details): The α, β, and αβ communities are, respectively, red, blue, and green. The larger the secondary-structure contents in a community, the greater the color strength. All randomly structured communities are shown in black. Colors assigned to cluster-cluster links are as follows: red for links within α communities, blue for those within β communities, green for those within αβ communities, and black for other links.
Figure 5
Figure 5
Main and sub-partitioning of the cluster network.
Figure 6
Figure 6
Tertiary structures picked from 3D distribution for Kc = 1000 Colors. of clusters are the same as those depicted in Figure 4. Inter-cluster links are not shown. This figure is presented with the same orientation as that of Figure 4.
Figure 7
Figure 7
Radius of gyration Rg of clusters. With increasing Rg, the cluster color is redder. This figure is presented with the same orientation as that of Figure 4.
Figure 8
Figure 8
Relation between number (nseg) of segments involved in a cluster and number of clusters for Kc = 1000 (A), 2000 (B), and 3000 (C).
Figure 9
Figure 9
Connectivity distribution P(k) of cluster network at Kc = 1000 (A), 2000 (B), and 3000 (C). The X-axis k shows the number of links of a cluster connected to other clusters. Solid lines are the best-fit curves drawn assuming that P(k) decays with k exponentially.
Figure 10
Figure 10
Kc dependence of Ncom and Qmod. (A) The Kc dependence of modularity Qmod (Eq. 10). (B) The bar graph shows the Kc dependence of number, Ncom, of communities assigned to the left y-axis. The line with filled circles represents the ratio (assigned to right y-axis) of clusters in major communities to all clusters.
Figure 11
Figure 11
Communities at Kc = 1000 (A), 2000 (B), and 3000 (C). For each universe, only the top 13 communities by the number of involved clusters are shown. A single color is assigned to communities that are common to the three universes. Communities that are not common among the three are not shown, nor are minor communities.
Figure 12
Figure 12
Hierarchy in the segment universe proposed from the current study.
Figure 13
Figure 13
Smoothed inter-residue contacts c(i, j) (Eq. 4). It is presumed that residue pair (i, j) is in contact (i.e., c(i, j) = 1), and that the other pairs are non-contacting. Equation 4 provides negative cs(i', j') at sites where an inequality, |i - i'| + |j - j'| + |(|i - i'| - |j - j'|)| > 5, is satisfied. Besides, this inequality is satisfied without exception when any one of the three inequalities, |i - i'| > 2, |j - j'| > 2, or ||i - i'| - |j - j'|| > 2, is met. Those negative c(i, j) = 1), and that the other pairs are non-contacting. Equation 4 provides negative cs(i', j') are reset to zero (see text).
Figure 14
Figure 14
Two network types. Network (A) has larger modularity Qmod than (B) does. Filled circles form a community (Com 1); open ones construct the other community (Com 2). Lines between circles represent links.

Similar articles

References

    1. Chothia C. Proteins. One thousand families for the molecular biologist. Nature. 1992;357:543–544. doi: 10.1038/357543a0. - DOI - PubMed
    1. Gibrat JF, Madej T, Bryant SH. Surprising similarities in structure comparison. Curr Opin Struct Biol. 1996;6:377–385. doi: 10.1016/S0959-440X(96)80058-3. - DOI - PubMed
    1. Coulson AFW, Moult J. A unifold, mesofold, and superfold model of protein fold use. Proteins. 2002;46:61–71. doi: 10.1002/prot.10011. - DOI - PubMed
    1. Liu X, Fan K, Wang W. The number of protein folds and their distribution over families in nature. Proteins. 2004;54:491–499. doi: 10.1002/prot.10514. - DOI - PubMed
    1. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247:536–540. - PubMed

Publication types

LinkOut - more resources