Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016 Jul;25(7):1188-203.
doi: 10.1002/pro.2893. Epub 2016 Feb 21.

Classification of proteins with shared motifs and internal repeats in the ECOD database

Affiliations
Review

Classification of proteins with shared motifs and internal repeats in the ECOD database

R Dustin Schaeffer et al. Protein Sci. 2016 Jul.

Abstract

Proteins and their domains evolve by a set of events commonly including the duplication and divergence of small motifs. The presence of short repetitive regions in domains has generally constituted a difficult case for structural domain classifications and their hierarchies. We developed the Evolutionary Classification Of protein Domains (ECOD) in part to implement a new schema for the classification of these types of proteins. Here we document the ways in which ECOD classifies proteins with small internal repeats, widespread functional motifs, and assemblies of small domain-like fragments in its evolutionary schema. We illustrate the ways in which the structural genomics project impacted the classification and characterization of new structural domains and sequence families over the decade.

Keywords: internal; protein classification; protein motifs; repeats; structural bioinformatics; structural genomics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Diverse architectures surrounding the His‐Me finger endonuclease motif. Domains in the His‐Me finger endonucleases H‐group contain a conserved functional supersecondary structure motif β‐hairpin followed by a short α‐helix (colored magenta). The motif tends to include an N‐terminal α‐helix on the alternate side of the functional α‐helix and is embedded in various different architectures, including (A) a Zn‐finger in restriction endonxuclease HPY99I (e3fc3A3), (B) an α‐helical fold in CRISPR ‐associated Cas9 endonuclease (e4un3B4), and (C) an α + β 3‐layer sandwich in ectonucleotide pyrophosphatase/phosphodiesterase‐1 (e4b56A2).
Figure 2
Figure 2
Motifs within divergent folds. The P‐loop domains‐related H‐group includes two T‐groups colored according to secondary structural element, with the P‐loop in magenta. α‐helices are in cyan and β‐sheets in yellow, with any remaining regions in gray. (A) The P‐loop containing nucleoside triphosphate hydrolases exemplified by Guanylate Kinase (e4qrhA1) has a main α/β/α core topology with five parallel β‐strands, while the (B) PEP carboxykinase‐like group exemplified by HPr kinase/phosphatase has a β‐strand core that wraps into an open barrel with α‐helices flanking one side. The β‐hammerhead motif (magenta) unifies the α/β‐hammerhead/Barrel‐sandwich hybrid H‐group including (C) the all‐β single‐hybrid motif (e1dczA1), and (D) the α/β‐hammerhead (e1brwA2).
Figure 3
Figure 3
Potential β‐hammerhead motif evolution. Two alternate ancient three‐stranded units could dimerize to form a stable structure (in brackets). Evolutionary scenarios including fusion of the ancient duplicated units lead to (A) present day single‐hybrid motif. Insertion of a duplicated single‐hybrid motif combined with hammerhead deterioration lead to (B) duplicated‐hybrid motifs. Indels and hammerhead deterioration of single‐hybrid motif lead to (C) Ribosomal L27 protein. Several alternate pathways could lead to various (D) α/β‐hammerhead folds from all‐β hammerheads (gray dotted arrows).
Figure 4
Figure 4
Divergent and convergent evolution of psi‐loop motifs. The psi‐loop motif (magenta) helps define the active site (HExxH in black stick) of zincin homologs with diverse folds. (A) The common zincin core includes an N‐terminal helix (slate), followed by the psi‐loop (magenta), and the HExxH‐containing active site helix (salmon). This core is decorated by various different SSEs (helix in cyan and strand in yellow) with (B) a peptidase M56 family structure (e4qhfA1) decorated by a single C‐terminal helix, (C) a reprolysin 5 family structure (e2i47A1) decorated by multiple α‐helices as well as an elongation of the psi‐loop sheet with parallel β‐strands, and (D) a peptidase M27 structure (e3bonA1) decorated by multiple α‐helices and β‐strands, elongating the psi‐loop sheet with an anti‐parallel β‐meander. The psi‐loop motif also occurs as part of the evolutionary core of unrelated folds, such as (E) double‐psi β‐barrels (e4avrA1), (F) Rossmann‐like structures of bacterial fluorinating enzyme N‐terminal domains (e1rqrA2), and (G) four‐layered metallo‐dependent phosphatase sandwiches (e2zo0B1).
Figure 5
Figure 5
α‐hairpin repeat domains in ECOD. (A) Alpha internal repeat domains in ECOD from the HEAT, TPR, and Armadillo repeats are classified as homologs. (B) The ankyrin domains are classified as possibly homologs to ARM/HEAT/TPR domains. (C) Alpha/alpha toroids, such as farnesyltransferase, are alpha‐hairpin repeats that are closed, rather than open, and not as easily expanded by axial duplication.
Figure 6
Figure 6
Diverse topologies of the β‐propellers. Examples of non‐canonical topologies in the β‐propeller homologs group. (A) One blade of the deteriorated propeller domain of adsorption PRD1 P2 (e1n7vA1) has been replaced with a novel β domain with complex topology (e1n7vA2). (B) The luminal domain of endoribonuclease IRE1 (e2be1A1) is topologically dissimilar to other β‐propellers, but strong sequence homology to propeller repeats is detected between IRE1 and other canonical propellers. (C) PH1500 is 12‐bladed propeller composed of a hexamer of double propeller repeats, both an obligate multimer and composed of internal repeats (PDB: 2m3x).
Figure 7
Figure 7
Evolution of the ferritin domains. The evolution of the core four‐helix bundle (in rainbow, from blue to red) of the ferritin/heme oxygenase H‐group from duplication and fusion of two “half ferritin” helix‐hairpins (lower left panel). The head to tail arrangement of the helix‐hairpins requires a crossover connection, which occurs between the first two helices (ferritin/heme oxygenase labeled in blue) or the last two (cobalamin adenosyltransferase labeled in green). Secondary structure decorations of the core fold (N‐terminal decoration in slate, C‐terminal decoration in salmon) as well as alternate crossover connection compositions (colored in light green) occur in several ferritin/heme oxygenase families. The positioning of the active sites marked by di‐iron centers or other ligands (black spheres) in the core of the four‐helix bundle provides the basis for uniting the different families. Duplication combined with secondary structure deletion and active site loss occurred in a subunit of toluene monooxygenase, while a migration of a heme binding site occurred in heme oxygenase.
Figure 8
Figure 8
Comparison of domain definitions between ECOD and Pfam. (A) The distribution of Pfam family coverage on a nonredundant set of ECOD domains that have a one‐to‐one mapping to Pfam families. (B) Mapping of Pfam family XPG_N (PF00752, blue) and XPG_I (PF00867, orange) on RAD2 structure (PDB: 4q0w). (C) Mapping of HAD‐related domain (e4q0wA2, pink) and SAM‐like domain (e4q0wA1, cyan) from ECOD on the same structure. Side chains of catalytic residues are shown in stick, with the coordinating calcium ion in green sphere. (D) Top 20 H‐groups where split Pfam families are assigned.
Figure 9
Figure 9
Cumulative total over time of structural genomics targets in ECOD. Distribution of domains from structural genomics targets over time (A) by domains and (B) by hierarchical groups. SG domains were considered to form a new group if they were the earliest deposited domain in that group. Moving averages (1 year) were calculated for structural genomics domains that (C) formed new X‐groups and (D) newly characterized sequence families.

Similar articles

Cited by

References

    1. Grishin NV (2001) Fold change in evolution of protein structures. J Struct Biol 134:167–185. - PubMed
    1. Grishin NV (2001) KH domain: one motif, two folds. Nucleic Acids Res 29:638–643. - PMC - PubMed
    1. Lupas AN, Ponting CP, Russell RB (2001) On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world? J Struct Biol 134:191–203. - PubMed
    1. Majumdar I, Kinch LN, Grishin NV (2009) A database of domain definitions for proteins with complex interdomain geometry. PLoS One 4:e5084. - PMC - PubMed
    1. Kinch LN, Grishin NV (2002) Evolution of protein structures and functions. Curr Opin Struct Biol 12:400–408. - PubMed

Publication types

LinkOut - more resources