. 2006 Feb 21;103(8):2605-10.

doi: 10.1073/pnas.0509379103. Epub 2006 Feb 14.

On the origin and highly likely completeness of single-domain protein structures

Yang Zhang¹, Isaac A Hubner, Adrian K Arakaki, Eugene Shakhnovich, Jeffrey Skolnick

Affiliations

PMID: 16478803
PMCID: PMC1413790
DOI: 10.1073/pnas.0509379103

On the origin and highly likely completeness of single-domain protein structures

Yang Zhang et al. Proc Natl Acad Sci U S A. 2006.

. 2006 Feb 21;103(8):2605-10.

doi: 10.1073/pnas.0509379103. Epub 2006 Feb 14.

Authors

Yang Zhang¹, Isaac A Hubner, Adrian K Arakaki, Eugene Shakhnovich, Jeffrey Skolnick

Affiliation

¹ Center of Excellence in Bioinformatics, University at Buffalo, State University of New York, 901 Washington Street, Buffalo, NY 14203, USA.

PMID: 16478803
PMCID: PMC1413790
DOI: 10.1073/pnas.0509379103

Abstract

The size and origin of the protein fold universe is of fundamental and practical importance. Analyzing randomly generated, compact sticky homopolypeptide conformations constructed in generic simplified and all-atom protein models, all have similar folds in the library of solved structures, the Protein Data Bank, and conversely, all compact, single-domain protein structures in the Protein Data Bank have structural analogues in the compact model set. Thus, both sets are highly likely complete, with the protein fold universe arising from compact conformations of hydrogen-bonded, secondary structures. Because side chains are represented by their Cbeta atoms, these results also suggest that the observed protein folds are insensitive to the details of side-chain packing. Sequence specificity enters both in fine-tuning the structure and thermodynamically stabilizing a given fold with respect to the set of alternatives. Scanning the models against a three-dimensional active-site library, close geometric matches are frequently found. Thus, the presence of active-site-like geometries also seems to be a consequence of the packing of compact, secondary structural elements. These results have significant implications for the evolution of protein structure and function.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest statement: No conflicts declared.

Figures

**Fig. 1.**
Rmsd vs. alignment coverage of computer-generated models matched with the closest representative structure in the PDB. (*Left*) For each homopolypeptide with a given secondary structure pattern, 14 models (Top 1, 10, 25, 50, 75, 100, 125, 150, 175, and 200 clusters) are selected; for each, only the match of the highest TM-score identified by TM-ALIGN is presented. (A) The 100-aa (AA) atomic, off-lattice models. (B) The 100-AA reduced lattice models. (C) The 200-AA reduced lattice models. (*Right*) Corresponding representative examples of the structural alignments in different categories are shown. Thick backbones are from models; thin backbones are from PDB structures. Red indicates residue pairs whose distance is <5 Å; those separated by >5 Å are shown in magenta (model) and blue (PDB structure), respectively.

**Fig. 2.**
Rmsd vs. alignment coverage for the compact freely jointed chain models matched by TM-ALIGN to the closest representative PDB structure. (A) The 100-aa (AA) chains. (B) The 200 AA chains. For each chain, 20 independent Monte Carlo simulations are generated, which have excluded volume interactions (C^α–C^α distance > 3 Å) and a bias to the radius of gyration (G) of an average protein of length L, i.e., G ≈ 2.2L^0.38. For each independent simulation, up to 14 clusters chosen as in Fig. 1 are used in the structure comparison. Red indicates residue pairs whose distance is <5 Å; those separated by >5 Å are shown in magenta (model) and blue (PDB structure), respectively.

**Fig. 3.**
Relationship of the library of compact, sticky homopolypeptide structures to PDB structures between 41 and 150 residues in length. (*Upper*) Rmsd vs. coverage for 913 representative, compact PDB structures between 41 and 150 residues to protein models in the 200-residue-long, compact, sticky homopolypeptide structural library comprised of 15,000 (A), 7,000 (B), and 3,500 (C) structures, respectively. (*Lower*) Structural alignments of representative α-protein (PDB ID code 1c17 chain M; 142 residues), β-protein (PDB ID code 1a3k; 137 residues), and α/β-protein (PDB ID code 1a3k chain A; 131 residues) PDB structures to the compact sticky homopolypeptide structures are shown. The thick (thin) backbones represent computer models (PDB structures). Red indicates residue pairs whose distance is <5 Å.

**Fig. 4.**
Fraction of the 150 active-site functional templates, AFTs that hit at least 1% of 750 sticky homopolypeptide structures (magenta histogram), at least 1% of 750 native structures (blue histogram), or at least one of 3,500 compact sticky homopolypeptide structures (yellow histogram) at a given drmsd interval from the corresponding restrictive cutoff.

See this image and copyright information in PMC

References

1. Anfinsen C. B. Science. 1973;181:223–230. - PubMed
1. Todd A. E., Orengo C. A., Thornton J. M. Curr. Opin. Chem. Biol. 1999;3:548–556. - PubMed
1. Card P. B., Gardner K. H. Methods Enzymol. 2005;394:3–16. - PubMed
1. Chothia C., Finkelstein A. V. Annu. Rev. Biochem. 1990;59:1007–1039. - PubMed
1. Burley S. K., Bonanno J. B. Annu. Rev. Genomics Hum. Genet. 2002;3:243–262. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

R01 GM037408/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

On the origin and highly likely completeness of single-domain protein structures

Affiliation

On the origin and highly likely completeness of single-domain protein structures

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources