Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Apr;10(2):181-91.
doi: 10.1007/s10969-008-9055-6. Epub 2009 Feb 5.

Structural genomics is the largest contributor of novel structural leverage

Affiliations

Structural genomics is the largest contributor of novel structural leverage

Rajesh Nair et al. J Struct Funct Genomics. 2009 Apr.

Abstract

The Protein Structural Initiative (PSI) at the US National Institutes of Health (NIH) is funding four large-scale centers for structural genomics (SG). These centers systematically target many large families without structural coverage, as well as very large families with inadequate structural coverage. Here, we report a few simple metrics that demonstrate how successfully these efforts optimize structural coverage: while the PSI-2 (2005-now) contributed more than 8% of all structures deposited into the PDB, it contributed over 20% of all novel structures (i.e. structures for protein sequences with no structural representative in the PDB on the date of deposition). The structural coverage of the protein universe represented by today's UniProt (v12.8) has increased linearly from 1992 to 2008; structural genomics has contributed significantly to the maintenance of this growth rate. Success in increasing novel leverage (defined in Liu et al. in Nat Biotechnol 25:849-851, 2007) has resulted from systematic targeting of large families. PSI's per structure contribution to novel leverage was over 4-fold higher than that for non-PSI structural biology efforts during the past 8 years. If the success of the PSI continues, it may just take another approximately15 years to cover most sequences in the current UniProt database.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
PSI annual throughput as percentage of the worldwide PDB and the US-PDB. a Annual statistics for the fraction of structures determined by the PSI (Protein Structure Initiative at NIH’s NIGMS) distinguishing between the contribution to all structures deposited in a given year (gray bars), and the contribution toward novel structures (blue bars). In this context, we considered any structure that yielded novel leverage for at least 50 consecutive residues as a “novel structure”. The PSI contribution to novel leverage is 2–3 times higher than its contribution to all structures. 100% marks all structures determined by US-laboratories. b While panel (a) shows the fractions of structures, panel (b) shows the fraction of novel leverage added in each year (i.e. PSI novel leverage/US-PDB novel leverage), in terms of per-protein (orange) and per-residue (purple) values. Panels (c) and (d) distinguish between the contribution from the PSI, from the US without PSI, from structural genomics (SG) without the PSI and from all other depositors. In particular, we distinguish the contribution to all structures (c) and that to all novel leverage (d). Note that in all figures the years refer to PSI grant years, e.g. 2001 refers to the period of July 2000–June 2001. The last entry (labeled 2009) marks an incomplete year from July 2008–September 2008 corresponding to the first quarter of year 4 of PSI-2
Fig. 2
Fig. 2
Increase of structural coverage of UniProt. Plotted are the percentage of proteins (orange with crossed squares) and residues (purple with open squares) in the entire UniProt database (release 12.8 Feb. 2008) that potentially be modeled using one of the structures in the PDB as a template, where “modelability” is based on PSI-BLAST alignments (E-value < 10−10) between the sequence of the target and the sequence of the template of known structure. Panel (a) shows the percentage of UniProt with structural coverage, per year, while panel (b) on the right (coloring as in Fig. 1) zooms in to showing the gain in coverage with respect to the onset of PSI (July 2000). Note that the absolute values of coverage depend crucially on the values chosen for what is considered to be an acceptable model. Our choices of E-values < 10−10 provide relatively conservative estimates for high-accuracy models
Fig. 3
Fig. 3
Per-structure estimates of novel leverage. The left panel (a) demonstrates how the non-cumulative (annual) novel leverage for UniProt 12.8 per deposited structure decreases over time because the task of generating high novel leverage becomes increasingly difficult. The right panel (b) reports the relative annual coverage per deposited structure (Q, Eq. 3). Values Q below 1 mark contributions below the average over the entire PDB in the year. While the relative values given in Fig. 1 vary little with the particular threshold for what is considered to be a “useful model”, the absolute values given in Fig. 2 and Fig. 3 depend crucially on the values chosen for what is considered to be an acceptable model. Coloring as in Fig. 1: pink with open circle: PSI alone; blue with open squares: structures from US labs excluding structures claimed by PSI; red with filled triangle: SG structures from non-PSI efforts; green with filled diamonds: structures from outside the US not claimed by any SG consortium

Similar articles

Cited by

References

    1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'DOI', 'value': '10.1093/nar/gkm993', 'is_inner': False, 'url': 'https://doi.org/10.1093/nar/gkm993'}, {'type': 'PMC', 'value': 'PMC2238974', 'is_inner': False, 'url': 'https://pmc.ncbi.nlm.nih.gov/articles/PMC2238974/'}, {'type': 'PubMed', 'value': '18000004', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/18000004/'}]}
    2. Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, Murzin AG (2008) Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 36:D419–D425. doi:10.1093/nar/gkm993 - PMC - PubMed
    1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'DOI', 'value': '10.1093/nar/gkh131', 'is_inner': False, 'url': 'https://doi.org/10.1093/nar/gkh131'}, {'type': 'PMC', 'value': 'PMC308865', 'is_inner': False, 'url': 'https://pmc.ncbi.nlm.nih.gov/articles/PMC308865/'}, {'type': 'PubMed', 'value': '14681372', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/14681372/'}]}
    2. Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M et al (2004) UniProt: the universal protein knowledgebase. Nucleic Acids Res 32:D115–D119. doi:10.1093/nar/gkh131 - PMC - PubMed
    1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'DOI', 'value': '10.1016/j.str.2006.06.005', 'is_inner': False, 'url': 'https://doi.org/10.1016/j.str.2006.06.005'}, {'type': 'PubMed', 'value': '16955948', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/16955948/'}]}
    2. Berman HM, Burley SK, Chiu W, Sali A, Adzhubei A, Bourne PE, Bryant SH, Dunbrack RL Jr, Fidelis K, Frank J et al (2006) Outcome of a workshop on archiving structural models of biological macromolecules. Structure 14:1211–1217. doi:10.1016/j.str.2006.06.005 - PubMed
    1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'DOI', 'value': '10.1093/nar/gkl971', 'is_inner': False, 'url': 'https://doi.org/10.1093/nar/gkl971'}, {'type': 'PMC', 'value': 'PMC1669775', 'is_inner': False, 'url': 'https://pmc.ncbi.nlm.nih.gov/articles/PMC1669775/'}, {'type': 'PubMed', 'value': '17142228', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/17142228/'}]}
    2. Berman H, Henrick K, Nakamura H, Markley JL (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res 35:D301–D303. doi:10.1093/nar/gkl971 - PMC - PubMed
    1. Bertonati C, Punta M, Fischer M, Yachdav G, Forouhar F, Zhou W, Kuzin AP, Seetharaman J, Abashidze M, Ramelot TA et al (2008) Structural genomics reveals EVE as a new ASCH/PUA-related domain. Proteins. doi:10.1002/prot.22287 - PMC - PubMed

Publication types