Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 May 7;338(4):633-41.
doi: 10.1016/j.jmb.2004.03.039.

Domain insertions in protein structures

Affiliations

Domain insertions in protein structures

R Aroul-Selvam et al. J Mol Biol. .

Abstract

Domains are the structural, functional or evolutionary units of proteins. Proteins can comprise a single domain or a combination of domains. In multi-domain proteins, the domains almost always occur end-to-end, i.e., one domain follows the C-terminal end of another domain. However, there are exceptions to this common pattern, where multi-domain proteins are formed by insertion of one domain (insert) into another domain (parent). Here, we provide a quantitative description of known insertions in the Protein Data Bank (PDB). We found that 9% of domain combinations observed in non-redundant PDB are insertions. Although 90% of all insertions involve only one insert, proteins can clearly have multiple (nested, two-domain and three-domain) inserts. We also observed correlations between the structure and function of a domain and its tendency to be found as a parent or an insert. There is a bias in insert position towards the C terminus of parents. We observed that the atomic distance between the N and C terminus of an insert is significantly smaller when compared to the N-to-C distance in a parent context or a single domain context. Insertions are found always to occur in loop regions of parent domains. Our observations regarding the relationship between domain insertions and the structure, function and evolution of proteins have implications for protein engineering.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Domain insertion in E. coli enzyme RNA 3′-terminal phosphate cyclase (PDB 1qmhA). The E. coli enzyme RNA 3′-terminal phosphate cyclase consists of two domains, of which one is inserted within the other. The parent domain (residues 5–184, 280–338, coloured purple) consists of three repeated folding units; each unit has two α-helices and a four-stranded β-sheet. The folding unit resembles the C-terminal domain of bacterial translation initiation factor 3 (IF3). Between an α-helix and a β-strand of the third IF3-like repeat of the parent domain, there is a smaller inserted domain (residues 185–279, coloured red). Although the inserted domain has the same secondary structural elements as the parent domain, it has a different topology and a different fold. Insert resembles the fold observed in human thioredoxin. The figure was prepared using the program MOLSCRIPT.
Figure 2
Figure 2
Schematic representation of types of domain insertions observed in protein structures. Figures of protein structures were prepared using the program MOLSCRIPT. (a) Single insertion (e.g., 1qmhA). (b) Nested insertion (e.g., 1a6dA). “insert1 N′ and “insert1 C′ represent the N and C terminus of insert, respectively. (c) Two-domain insertion (e.g., 1zfjA). (d) Three-domain insertion (e.g., 1dq3A).
Figure 3
Figure 3
(a) Domain length distribution for all domains in the non-redundant set of protein structures (PDB_90). (b) Domain length distribution for parent domains.
Figure 4
Figure 4
(a) Proportion of residues in parent and insert domains in parent-insert combinations. (b) Point of insertion in parent domain. Insert position is given as a fraction of total length of parent domain.

References

    1. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 1995;247:536–540. - PubMed
    1. Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM. CATH—a hierarchic classification of protein domain structures. Structure. 1997;5:1093–1108. - PubMed
    1. Holm L, Sander C. Mapping the protein universe. Science. 1996;273:595–603. - PubMed
    1. Chothia C. Proteins. One thousand families for the molecular biologist. Nature. 1992;357:543–544. - PubMed
    1. Bork P, Downing AK, Kieffer B, Campbell ID. Structure and distribution of modules in extracellular proteins. Quart. Rev. Biophys. 1996;29:119–167. - PubMed

Publication types