Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Feb 21;103(8):2605-10.
doi: 10.1073/pnas.0509379103. Epub 2006 Feb 14.

On the origin and highly likely completeness of single-domain protein structures

Affiliations

On the origin and highly likely completeness of single-domain protein structures

Yang Zhang et al. Proc Natl Acad Sci U S A. .

Abstract

The size and origin of the protein fold universe is of fundamental and practical importance. Analyzing randomly generated, compact sticky homopolypeptide conformations constructed in generic simplified and all-atom protein models, all have similar folds in the library of solved structures, the Protein Data Bank, and conversely, all compact, single-domain protein structures in the Protein Data Bank have structural analogues in the compact model set. Thus, both sets are highly likely complete, with the protein fold universe arising from compact conformations of hydrogen-bonded, secondary structures. Because side chains are represented by their Cbeta atoms, these results also suggest that the observed protein folds are insensitive to the details of side-chain packing. Sequence specificity enters both in fine-tuning the structure and thermodynamically stabilizing a given fold with respect to the set of alternatives. Scanning the models against a three-dimensional active-site library, close geometric matches are frequently found. Thus, the presence of active-site-like geometries also seems to be a consequence of the packing of compact, secondary structural elements. These results have significant implications for the evolution of protein structure and function.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest statement: No conflicts declared.

Figures

Fig. 1.
Fig. 1.
Rmsd vs. alignment coverage of computer-generated models matched with the closest representative structure in the PDB. (Left) For each homopolypeptide with a given secondary structure pattern, 14 models (Top 1, 10, 25, 50, 75, 100, 125, 150, 175, and 200 clusters) are selected; for each, only the match of the highest TM-score identified by TM-ALIGN is presented. (A) The 100-aa (AA) atomic, off-lattice models. (B) The 100-AA reduced lattice models. (C) The 200-AA reduced lattice models. (Right) Corresponding representative examples of the structural alignments in different categories are shown. Thick backbones are from models; thin backbones are from PDB structures. Red indicates residue pairs whose distance is <5 Å; those separated by >5 Å are shown in magenta (model) and blue (PDB structure), respectively.
Fig. 2.
Fig. 2.
Rmsd vs. alignment coverage for the compact freely jointed chain models matched by TM-ALIGN to the closest representative PDB structure. (A) The 100-aa (AA) chains. (B) The 200 AA chains. For each chain, 20 independent Monte Carlo simulations are generated, which have excluded volume interactions (Cα–Cα distance > 3 Å) and a bias to the radius of gyration (G) of an average protein of length L, i.e., G ≈ 2.2L0.38. For each independent simulation, up to 14 clusters chosen as in Fig. 1 are used in the structure comparison. Red indicates residue pairs whose distance is <5 Å; those separated by >5 Å are shown in magenta (model) and blue (PDB structure), respectively.
Fig. 3.
Fig. 3.
Relationship of the library of compact, sticky homopolypeptide structures to PDB structures between 41 and 150 residues in length. (Upper) Rmsd vs. coverage for 913 representative, compact PDB structures between 41 and 150 residues to protein models in the 200-residue-long, compact, sticky homopolypeptide structural library comprised of 15,000 (A), 7,000 (B), and 3,500 (C) structures, respectively. (Lower) Structural alignments of representative α-protein (PDB ID code 1c17 chain M; 142 residues), β-protein (PDB ID code 1a3k; 137 residues), and α/β-protein (PDB ID code 1a3k chain A; 131 residues) PDB structures to the compact sticky homopolypeptide structures are shown. The thick (thin) backbones represent computer models (PDB structures). Red indicates residue pairs whose distance is <5 Å.
Fig. 4.
Fig. 4.
Fraction of the 150 active-site functional templates, AFTs that hit at least 1% of 750 sticky homopolypeptide structures (magenta histogram), at least 1% of 750 native structures (blue histogram), or at least one of 3,500 compact sticky homopolypeptide structures (yellow histogram) at a given drmsd interval from the corresponding restrictive cutoff.

References

    1. Anfinsen C. B. Science. 1973;181:223–230. - PubMed
    1. Todd A. E., Orengo C. A., Thornton J. M. Curr. Opin. Chem. Biol. 1999;3:548–556. - PubMed
    1. Card P. B., Gardner K. H. Methods Enzymol. 2005;394:3–16. - PubMed
    1. Chothia C., Finkelstein A. V. Annu. Rev. Biochem. 1990;59:1007–1039. - PubMed
    1. Burley S. K., Bonanno J. B. Annu. Rev. Genomics Hum. Genet. 2002;3:243–262. - PubMed

Publication types

LinkOut - more resources