Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2010 Oct;20(5):587-97.
doi: 10.1016/j.sbi.2010.08.001. Epub 2010 Aug 31.

Unmet challenges of structural genomics

Affiliations
Review

Unmet challenges of structural genomics

Maksymilian Chruszcz et al. Curr Opin Struct Biol. 2010 Oct.

Abstract

Structural genomics (SG) programs have developed during the last decade many novel methodologies for faster and more accurate structure determination. These new tools and approaches led to the determination of thousands of protein structures. The generation of enormous amounts of experimental data resulted in significant improvements in the understanding of many biological processes at molecular levels. However, the amount of data collected so far is so large that traditional analysis methods are limiting the rate of extraction of biological and biochemical information from 3D models. This situation has prompted us to review the challenges that remain unmet by SG, as well as the areas in which the potential impact of SG could exceed what has been achieved so far.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Source organisms of protein structures deposited in the PDB. (A) Structures of proteins produced in cell-free systems. (B) Structures produced by SG centers utilizing E. coli expression systems. Structures determined by NMR and X-ray diffraction methods are separated. “Other” indicates proteins originating mainly from bacterial genomes.
Figure 2
Figure 2
(A) Distribution of Na+-O distances in the PDB structures determined at resolution 1.2 Å or better (blue bars), and in the Cambridge Structural Database (CSD) (red bars). (B) The same distribution after re-refinement of a single structure (PDB code 3FJ0), which was solved by a traditional (i.e., non-SG) structural biology laboratory.
Figure 3
Figure 3
Crystal structure of β-glucosidase (PDB code 3FJ0) [76]. (A) Overall structure shown in ribbon representation, with a reaction intermediate shown in stick representation. (B) The structure has an unusually large number of Na+ ions (purple spheres). Water molecules are marked as red spheres. The inset shows the binding site of the reaction intermediate in greater detail. The automatic procedures described in [37,38] do improve the R factors, but do not correct the misidentification of waters as sodium ions: after automatic rerefinement, the resulting structure contains the same erroneous number of Na+ atoms (252).
Figure 4
Figure 4
Average time (in days) between data collection and deposition for SG and non-SG structures. Dark blue and green bars represent SG structures, whereas light blue and red bars represent non-SG structures deposited in 2000–2004 and 2005–2009, respectively. Structures were binned by reported resolution limit (0.4 Å bin width).
Figure 5
Figure 5
Quality indicators for protein structures. (A) Clashscore (calculated with MOLPROBITY) as a function of resolution for all crystal structures in the PDB (box plots) vs. the structures of protein targets used in evaluation of templates in molecular docking [77] (red circles), high-resolution docking (blue circle) [78] and modeling (green circles) [79]. The box plots are labeled as follows: red lines mark the clashscore median for a particular resolution range, the boxes include structures with clashscores between the 25th and 75th percentile, and the dashed lines include structures with clashscores between 25% − 1.5IQR (the interquartile range) and 75% + 1.5IQR. (B) R-factors as a function of resolution for crystal structures of protein targets used in evaluation of templates in molecular docking [77], high-resolution docking [78], and modeling [79]. Dark blue diamonds represent models with structure factors deposited, while light blue diamonds mark structures without structure factors. The blue line shows the linear regression of R-factor as a function of resolution for all PDB structures, while the green, purple, yellow and red lines are the analogous linear regression fits for structures deposited by SG in general, MCSG, JCSG and CSGID respectively. (C) A Ramachandran plot for all structures of protein targets used in evaluation of molecular docking [77]. This panel was created with COOT [80].
Figure 5
Figure 5
Quality indicators for protein structures. (A) Clashscore (calculated with MOLPROBITY) as a function of resolution for all crystal structures in the PDB (box plots) vs. the structures of protein targets used in evaluation of templates in molecular docking [77] (red circles), high-resolution docking (blue circle) [78] and modeling (green circles) [79]. The box plots are labeled as follows: red lines mark the clashscore median for a particular resolution range, the boxes include structures with clashscores between the 25th and 75th percentile, and the dashed lines include structures with clashscores between 25% − 1.5IQR (the interquartile range) and 75% + 1.5IQR. (B) R-factors as a function of resolution for crystal structures of protein targets used in evaluation of templates in molecular docking [77], high-resolution docking [78], and modeling [79]. Dark blue diamonds represent models with structure factors deposited, while light blue diamonds mark structures without structure factors. The blue line shows the linear regression of R-factor as a function of resolution for all PDB structures, while the green, purple, yellow and red lines are the analogous linear regression fits for structures deposited by SG in general, MCSG, JCSG and CSGID respectively. (C) A Ramachandran plot for all structures of protein targets used in evaluation of molecular docking [77]. This panel was created with COOT [80].
Figure 5
Figure 5
Quality indicators for protein structures. (A) Clashscore (calculated with MOLPROBITY) as a function of resolution for all crystal structures in the PDB (box plots) vs. the structures of protein targets used in evaluation of templates in molecular docking [77] (red circles), high-resolution docking (blue circle) [78] and modeling (green circles) [79]. The box plots are labeled as follows: red lines mark the clashscore median for a particular resolution range, the boxes include structures with clashscores between the 25th and 75th percentile, and the dashed lines include structures with clashscores between 25% − 1.5IQR (the interquartile range) and 75% + 1.5IQR. (B) R-factors as a function of resolution for crystal structures of protein targets used in evaluation of templates in molecular docking [77], high-resolution docking [78], and modeling [79]. Dark blue diamonds represent models with structure factors deposited, while light blue diamonds mark structures without structure factors. The blue line shows the linear regression of R-factor as a function of resolution for all PDB structures, while the green, purple, yellow and red lines are the analogous linear regression fits for structures deposited by SG in general, MCSG, JCSG and CSGID respectively. (C) A Ramachandran plot for all structures of protein targets used in evaluation of molecular docking [77]. This panel was created with COOT [80].

References

    1. Jaroszewski L, Li Z, Krishna SS, Bakolitsa C, Wooley J, Deacon AM, Wilson IA, Godzik A. Exploration of uncharted regions of the protein universe. PLoS Biol. 2009;7:e1000205. An analysis of the NIH PSI effort to determine representative structures of novel protein families. It arrives at the conclusion that the majority of these novel families represent highly divergent homologs of previously characterized protein families. - PMC - PubMed
    1. Dessailly BH, Nair R, Jaroszewski L, Fajardo JE, Kouranov A, Lee D, Fiser A, Godzik A, Rost B, Orengo C. PSI-2: structural genomics to cover protein domain family space. Structure. 2009;17:869–881. - PMC - PubMed
    1. Fan E, Baker D, Fields S, Gelb MH, Buckner FS, Van Voorhis WC, Phizicky E, Dumont M, Mehlin C, Grayhack E, et al. Structural genomics of pathogenic protozoa: an overview. Methods Mol Biol. 2008;426:497–513. - PubMed
    1. Ioerger TR, Sacchettini JC. Structural genomics approach to drug discovery for Mycobacterium tuberculosis. Curr Opin Microbiol. 2009;12:318–325. A review of the methodology used by the Tuberculosis Structural Genomics Consortium. This review also addresses the impact of the Consortium on the development of treatments for drug-resistant tuberculosis. - PubMed
    1. Edwards A. Large-scale structural biology of the human proteome. Annu Rev Biochem. 2009;78:541–568. A review that analyzed the impact of structural genomics on the determination of structures of human proteins. It identifies the most important protein families that are highly relevant for improvement of human health. - PubMed

Publication types

LinkOut - more resources