Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2013 Sep 3;21(9):1531-40.
doi: 10.1016/j.str.2013.08.007.

Protein modeling: what happened to the "protein structure gap"?

Affiliations
Review

Protein modeling: what happened to the "protein structure gap"?

Torsten Schwede. Structure. .

Abstract

Computational modeling of three-dimensional macromolecular structures and complexes from their sequence has been a long-standing vision in structural biology. Over the last 2 decades, a paradigm shift has occurred: starting from a large "structure knowledge gap" between the huge number of protein sequences and small number of known structures, today, some form of structural information, either experimental or template-based models, is available for the majority of amino acids encoded by common model organism genomes. With the scientific focus of interest moving toward larger macromolecular complexes and dynamic networks of interactions, the integration of computational modeling methods with low-resolution experimental techniques allows the study of large and complex molecular machines. One of the open challenges for computational modeling and prediction techniques is to convey the underlying assumptions, as well as the expected accuracy and structural variability of a specific model, which is crucial to understanding its limitations.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Mind the gap
The number of entries in the SwissProt and trEMBL sequence databases (UniProt-Consortium, 2013)) and the PDB (Berman et al., 2007) are growing exponentially, while the “protein structure gap” between sequence and structures is widening dramatically. Inset: Growth of PDB holdings from 1972 to 2013.
Figure 2
Figure 2. Structural template coverage of the human proteome
The fraction of amino acids in the human proteome showing sequence similarity to proteins with known structures in the PDB is shown over time, where colors indicate levels of sequence identity as detected by PSI-BLAST (Altschul et al., 1997). The area shaded in red indicates the fraction of about 30% of intrinsically unstructured residues estimated in the human proteome (Colak et al., 2013; Ward et al., 2004). Models built on templates sharing low sequence identity <20% are often of poor quality due to evolutionary divergence between target and template structures, and limitations of the modeling and refinement methods (illustrated in Figure 3B). Prokaryotic proteomes have in general a higher structural coverage than eukaryotic ones (Guex et al., 2009; Zhang et al., 2009).
Figure 3
Figure 3. Examples of blind structure predictions
Two proteins of different predicting difficulty are displayed – highlighting the importance of individual structure model quality estimation. A) Crystal structure of the acyl-CoA dehydrogenase from Slackia heliotrinireducens solved by the Midwest Center for Structural Genomics in superposition with the ten best blind predictions in the CASP10 experiment (T0758). Obviously in this case, all predictions agree well with the experimental reference structure and the differences between methods are small. In more difficult cases, like the crystal structure of a hypothetical protein from Ruminococcus gnavus solved at Joint Center for Structural Genomics (PDB:4GL6) shown in panel B (CASP target T0684-d2), no suitable template structure could be identified and the ten best predictions show large deviations from the reference structure and among each other. Obviously, reliable error estimated for the atomic coordinates of an individual model are crucial to judge the expected accuracy of individual models and their suitability for specific applications. Consensus between independent prediction methods has been shown to be a good indicator of model accuracy in general.
Figure 3
Figure 3. Examples of blind structure predictions
Two proteins of different predicting difficulty are displayed – highlighting the importance of individual structure model quality estimation. A) Crystal structure of the acyl-CoA dehydrogenase from Slackia heliotrinireducens solved by the Midwest Center for Structural Genomics in superposition with the ten best blind predictions in the CASP10 experiment (T0758). Obviously in this case, all predictions agree well with the experimental reference structure and the differences between methods are small. In more difficult cases, like the crystal structure of a hypothetical protein from Ruminococcus gnavus solved at Joint Center for Structural Genomics (PDB:4GL6) shown in panel B (CASP target T0684-d2), no suitable template structure could be identified and the ten best predictions show large deviations from the reference structure and among each other. Obviously, reliable error estimated for the atomic coordinates of an individual model are crucial to judge the expected accuracy of individual models and their suitability for specific applications. Consensus between independent prediction methods has been shown to be a good indicator of model accuracy in general.
Figure 4
Figure 4. Integrative structure model of the nuclear pore complex NPC
The molecular architecture of the approximately 50 MDa trans-membrane nuclear pore complex consist of 456 constituent proteins that selectively transport cargoes across the nuclear envelope (Alber et al., 2007a; Alber et al., 2007b). Image courtesy of Andrej Sali, UCSF (http://salilab.org).
Figure 5
Figure 5. Speculative data-driven 3D model of the bacterial division machinery
The model was created with the GraphiteLifeExplorer modeling tool (Hornus et al., 2013). The FtsZ tubulin-like protein (in blue/yellow) is shaped into a double-ring. A short filament of the FtsA actin-like protein (in light blue) is shown onto the Z-ring. One FtsK motor (in grey) pumps the DNA. This translocase is linked to the membrane (not shown) by six linkers (Vendeville et al., 2011). Image courtesy of Damien Larivière (http://www.lifeexplorer.eu/).

References

    1. Al-Amoudi A, Castano-Diez D, Devos DP, Russell RB, Johnson GT, Frangakis AS. The three-dimensional molecular structure of the desmosomal plaque. Proc Natl Acad Sci U S A. 2011;108:6480–6485. - PMC - PubMed
    1. Alber F, Dokudovskaya S, Veenhoff LM, Zhang W, Kipper J, Devos D, Suprapto A, Karni-Schmidt O, Williams R, Chait BT, et al. Determining the architectures of macromolecular assemblies. Nature. 2007a;450:683–694. - PubMed
    1. Alber F, Dokudovskaya S, Veenhoff LM, Zhang W, Kipper J, Devos D, Suprapto A, Karni-Schmidt O, Williams R, Chait BT, et al. The molecular architecture of the nuclear pore complex. Nature. 2007b;450:695–701. - PubMed
    1. Alber F, Forster F, Korkin D, Topf M, Sali A. Integrating diverse data for structure determination of macromolecular assemblies. Annu Rev Biochem. 2008;77:443–477. - PubMed
    1. Aller SG, Yu J, Ward A, Weng Y, Chittaboina S, Zhuo R, Harrell PM, Trinh YT, Zhang Q, Urbatsch IL, et al. Structure of P-glycoprotein reveals a molecular basis for poly-specific drug binding. Science. 2009;323:1718–1722. - PMC - PubMed

Publication types