Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2003 Jun;12(6):1177-87.
doi: 10.1110/ps.0232903.

Reducing the computational complexity of protein folding via fragment folding and assembly

Affiliations
Comparative Study

Reducing the computational complexity of protein folding via fragment folding and assembly

Nurit Haspel et al. Protein Sci. 2003 Jun.

Abstract

Understanding, and ultimately predicting, how a 1-D protein chain reaches its native 3-D fold has been one of the most challenging problems during the last few decades. Data increasingly indicate that protein folding is a hierarchical process. Hence, the question arises as to whether we can use the hierarchical concept to reduce the practically intractable computational times. For such a scheme to work, the first step is to cut the protein sequence into fragments that form local minima on the polypeptide chain. The conformations of such fragments in solution are likely to be similar to those when the fragments are embedded in the native fold, although alternate conformations may be favored during the mutual stabilization in the combinatorial assembly process. Two elements are needed for such cutting: (1) a library of (clustered) fragments derived from known protein structures and (2) an assignment algorithm that selects optimal combinations to "cover" the protein sequence. The next two steps in hierarchical folding schemes, not addressed here, are the combinatorial assembly of the fragments and finally, optimization of the obtained conformations. Here, we address the first step in a hierarchical protein-folding scheme. The input is a target protein sequence and a library of fragments created by clustering building blocks that were generated by cutting all protein structures. The output is a set of cutout fragments. We briefly outline a graph theoretic algorithm that automatically assigns building blocks to the target sequence, and we describe a sample of the results we have obtained.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Folding by parts and part assembly: A hierarchy-based protein folding scheme. The overall folding scheme, composed of three steps: The first step involves cutting the target sequence into building blocks and assigning their conformations (this stage is described in this work). In the second step, the building blocks are assembled combinatorially. In the third step, the structure is refined to yield the predicted conformation.
Figure 2.
Figure 2.
Two examples of clusters. The figures show multiple structural alignments using MUSTA (Leibowitz et al. 2001a,b) of the cluster members. The inserts give the file names and the range of residues. (A) (1bdc) Immunoglobulin-binding protein A modules from Staphylococcus aureus (domain B); (1grl) GroEL from Escherichia coli; (2pas) parvalbumin from pike; (1edi) immunoglobulin-binding protein A modules from S. aureus (domain E). (B) (1coy) Cholesterol oxidase from Brevibacterium sterolicum; (1gal) glucose oxidase from Aspergillus niger; (1gnd) guanine nucleotide dissociation inhibitor, GDI, from cow; (1ldm) lactate dehydrogenase from dogfish.
Figure 3.
Figure 3.
A flowchart of the building block assignment algorithm.
Figure 4.
Figure 4.
An illustration of the building block assignment algorithm. (A) The protein sequence is aligned against a building block sequence database using BLAST. (B) A weighted directed acyclic graph is built from the aligned building blocks, plus start and target vertices. (C) The "shortest" path is the "best" building block assignment to the target sequence.
Figure 5.
Figure 5.
Examples of the building block assignment algorithm. (A) The target protein is chain A of glutathione S-transferase from human, class pi (PDB code 13gs). (B) The target protein is pseudoazurin from Alcaligenes faecalis (PDB code 1paz). (C) The target protein is immunoglobulin from human (PDB code 2imm). (D) The target protein is chain D of barstar (barnase inhibitor) from Bacillus amyloliquefaciens (PDB code 1brs). (E) The target protein is chain A of anti-sigma factor antagonist SpoIIaa from Bacillus sphaericus (PDB code 1h4x). (F) The target protein is Clp protease, ClpP subunit from Escherichia coli (PDB code 1tyf). The inserts shows the matched building block sequences. (See also Table 2.)
Figure 5.
Figure 5.
Examples of the building block assignment algorithm. (A) The target protein is chain A of glutathione S-transferase from human, class pi (PDB code 13gs). (B) The target protein is pseudoazurin from Alcaligenes faecalis (PDB code 1paz). (C) The target protein is immunoglobulin from human (PDB code 2imm). (D) The target protein is chain D of barstar (barnase inhibitor) from Bacillus amyloliquefaciens (PDB code 1brs). (E) The target protein is chain A of anti-sigma factor antagonist SpoIIaa from Bacillus sphaericus (PDB code 1h4x). (F) The target protein is Clp protease, ClpP subunit from Escherichia coli (PDB code 1tyf). The inserts shows the matched building block sequences. (See also Table 2.)
Figure 5.
Figure 5.
Examples of the building block assignment algorithm. (A) The target protein is chain A of glutathione S-transferase from human, class pi (PDB code 13gs). (B) The target protein is pseudoazurin from Alcaligenes faecalis (PDB code 1paz). (C) The target protein is immunoglobulin from human (PDB code 2imm). (D) The target protein is chain D of barstar (barnase inhibitor) from Bacillus amyloliquefaciens (PDB code 1brs). (E) The target protein is chain A of anti-sigma factor antagonist SpoIIaa from Bacillus sphaericus (PDB code 1h4x). (F) The target protein is Clp protease, ClpP subunit from Escherichia coli (PDB code 1tyf). The inserts shows the matched building block sequences. (See also Table 2.)
Figure 5.
Figure 5.
Examples of the building block assignment algorithm. (A) The target protein is chain A of glutathione S-transferase from human, class pi (PDB code 13gs). (B) The target protein is pseudoazurin from Alcaligenes faecalis (PDB code 1paz). (C) The target protein is immunoglobulin from human (PDB code 2imm). (D) The target protein is chain D of barstar (barnase inhibitor) from Bacillus amyloliquefaciens (PDB code 1brs). (E) The target protein is chain A of anti-sigma factor antagonist SpoIIaa from Bacillus sphaericus (PDB code 1h4x). (F) The target protein is Clp protease, ClpP subunit from Escherichia coli (PDB code 1tyf). The inserts shows the matched building block sequences. (See also Table 2.)
Figure 5.
Figure 5.
Examples of the building block assignment algorithm. (A) The target protein is chain A of glutathione S-transferase from human, class pi (PDB code 13gs). (B) The target protein is pseudoazurin from Alcaligenes faecalis (PDB code 1paz). (C) The target protein is immunoglobulin from human (PDB code 2imm). (D) The target protein is chain D of barstar (barnase inhibitor) from Bacillus amyloliquefaciens (PDB code 1brs). (E) The target protein is chain A of anti-sigma factor antagonist SpoIIaa from Bacillus sphaericus (PDB code 1h4x). (F) The target protein is Clp protease, ClpP subunit from Escherichia coli (PDB code 1tyf). The inserts shows the matched building block sequences. (See also Table 2.)
Figure 5.
Figure 5.
Examples of the building block assignment algorithm. (A) The target protein is chain A of glutathione S-transferase from human, class pi (PDB code 13gs). (B) The target protein is pseudoazurin from Alcaligenes faecalis (PDB code 1paz). (C) The target protein is immunoglobulin from human (PDB code 2imm). (D) The target protein is chain D of barstar (barnase inhibitor) from Bacillus amyloliquefaciens (PDB code 1brs). (E) The target protein is chain A of anti-sigma factor antagonist SpoIIaa from Bacillus sphaericus (PDB code 1h4x). (F) The target protein is Clp protease, ClpP subunit from Escherichia coli (PDB code 1tyf). The inserts shows the matched building block sequences. (See also Table 2.)

Similar articles

Cited by

References

    1. Abagyan, R.A. and Batalov, S. 1997. Do aligned sequences share the same fold? J. Mol. Biol. 273 355–368. - PubMed
    1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215 403–410. - PubMed
    1. Baldwin, R.L. and Rose, G.D. 1999a. Is protein folding hierarchic? I. Local structure and peptide folding. Trends Biochem. Sci. 24 26–33. - PubMed
    1. ———. 1999b. Is protein folding hierarchic? II. Folding intermediates and transition states. Trends Biochem. Sci. 24 77–84. - PubMed
    1. Bernstein, F.C., Koetzle, T.F., Williams, G.J.B., Meyer, E.F., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T., and Tasumi, M. 1977. The Protein Data Bank: A computer-based archival file for macromolecular structures. J. Mol. Biol. 112 535–542. - PubMed

Publication types

LinkOut - more resources