Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Aug;1(3):e31.
doi: 10.1371/journal.pcbi.0010031. Epub 2005 Aug 19.

Functional coverage of the human genome by existing structures, structural genomics targets, and homology models

Affiliations

Functional coverage of the human genome by existing structures, structural genomics targets, and homology models

Lei Xie et al. PLoS Comput Biol. 2005 Aug.

Abstract

The bias in protein structure and function space resulting from experimental limitations and targeting of particular functional classes of proteins by structural biologists has long been recognized, but never continuously quantified. Using the Enzyme Commission and the Gene Ontology classifications as a reference frame, and integrating structure data from the Protein Data Bank (PDB), target sequences from the structural genomics projects, structure homology derived from the SUPERFAMILY database, and genome annotations from Ensembl and NCBI, we provide a quantified view, both at the domain and whole-protein levels, of the current and projected coverage of protein structure and function space relative to the human genome. Protein structures currently provide at least one domain that covers 37% of the functional classes identified in the genome; whole structure coverage exists for 25% of the genome. If all the structural genomics targets were solved (twice the current number of structures in the PDB), it is estimated that structures of one domain would cover 69% of the functional classes identified and complete structure coverage would be 44%. Homology models from existing experimental structures extend the 37% coverage to 56% of the genome as single domains and 25% to 31% for complete structures. Coverage from homology models is not evenly distributed by protein family, reflecting differing degrees of sequence and structure divergence within families. While these data provide coverage, conversely, they also systematically highlight functional classes of proteins for which structures should be determined. Current key functional families without structure representation are highlighted here; updated information on the "most wanted list" that should be solved is available on a weekly basis from http://function.rcsb.org:8080/pdb/function_distribution/index.html.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

References

    1. Thornton JM, Todd AE, Milburn D, Borkakoti N, Orengo CA. From structure to function: Approaches and limitations. Nat Struct Biol. 2000;7:991–994. - PubMed
    1. Brenner SE, Levitt M. Expectations from structural genomics. Protein Sci. 2000;9:197. - PMC - PubMed
    1. Portugaly E, Linial M. Estimating the probability for a protein to have a new fold: A statistical computational model. Proc Natl Acad Sci U S A. 2000;97:5161. - PMC - PubMed
    1. Westbrook J, Feng Z, Chen L, Yang H, Berman HM. The Protein Data Bank and structural genomics. Nucleic Acids Res. 2003;31:489–491. - PMC - PubMed
    1. Peng K, Obradovic Z, Vucetic S. Exploring bias in the Protein Data Bank using contrast classifiers. Pac Symp Biocomput. 2004;2004:435–446. - PubMed

Publication types