Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 1998 Oct 1;33(1):88-96.
doi: 10.1002/(sici)1097-0134(19981001)33:1<88::aid-prot8>3.0.co;2-h.

Dictionary of recurrent domains in protein structures

Affiliations

Dictionary of recurrent domains in protein structures

L Holm et al. Proteins. .

Abstract

The rapid growth in the number of experimentally determined three-dimensional protein structures has sharpened the need for comprehensive and up-to-date surveys of known structures. Classic work on protein structure classification has made it clear that a structural survey is best carried out at the level of domains, i.e., substructures that recur in evolution as functional units in different protein contexts. We present a method for automated domain identification from protein structure atomic coordinates based on quantitative measures of compactness and, as the new element, recurrence. Compactness criteria are used to recursively divide a protein into a series of successively smaller and smaller substructures. Recurrence criteria are used to select an optimal size level of these substructures, so that many of the chosen substructures are common to different proteins at a high level of statistical significance. The joint application of these criteria automatically yields consistent domain definitions between remote homologs, a result difficult to achieve using compactness criteria alone. The method is applied to a representative set of 1,137 sequence-unique protein families covering 6,500 known structures. Clustering of the resulting set of domains (substructures) yields 594 distinct fold classes (types of substructures). The Dali Domain Dictionary (http://www.embl-ebi.ac.uk/dali/) not only provides a global structural classification, but also a comprehensive description of families of protein sequences grouped around representative proteins of known structure. The classification will be continuously updated and can serve as a basis for improving our understanding of protein evolution and function and for evolving optimal strategies to complete the map of all natural protein structures.

PubMed Disclaimer

LinkOut - more resources