Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Apr 28:7:98.
doi: 10.1186/1471-2164-7-98.

Phylogenomic analysis of the GIY-YIG nuclease superfamily

Affiliations

Phylogenomic analysis of the GIY-YIG nuclease superfamily

Stanislaw Dunin-Horkawicz et al. BMC Genomics. .

Abstract

Background: The GIY-YIG domain was initially identified in homing endonucleases and later in other selfish mobile genetic elements (including restriction enzymes and non-LTR retrotransposons) and in enzymes involved in DNA repair and recombination. However, to date no systematic search for novel members of the GIY-YIG superfamily or comparative analysis of these enzymes has been reported.

Results: We carried out database searches to identify all members of known GIY-YIG nuclease families. Multiple sequence alignments together with predicted secondary structures of identified families were represented as Hidden Markov Models (HMM) and compared by the HHsearch method to the uncharacterized protein families gathered in the COG, KOG, and PFAM databases. This analysis allowed for extending the GIY-YIG superfamily to include members of COG3680 and a number of proteins not classified in COGs and to predict that these proteins may function as nucleases, potentially involved in DNA recombination and/or repair. Finally, all old and new members of the GIY-YIG superfamily were compared and analyzed to infer the phylogenetic tree.

Conclusion: An evolutionary classification of the GIY-YIG superfamily is presented for the very first time, along with the structural annotation of all (sub)families. It provides a comprehensive picture of sequence-structure-function relationships in this superfamily of nucleases, which will help to design experiments to study the mechanism of action of known members (especially the uncharacterized ones) and will facilitate the prediction of function for the newly discovered ones.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Domain architectures observed in the GIY-YIG superfamily. Numbers in round brackets indicate NCBI gene identification (GI) numbers of representative members of proteins sharing domain architecture. All representatives are divided into presumably monophyletic groups according to the sequence clustering. Light yellow blocks indicate the common GIY-YIG domain. Other domain abbreviations are: ANK=ANKRD41, ankyrin repeat domain 41; LEM, nuclear membrane-associated proteins domain; His/Cys-rich, histidine- and cysteine-rich conserved region; RVT, reverse transcriptase; CCCC, region with four conserved Cys residues; UvrBb, UvrB-binding domain; EndoV, Endonuclease V-like nuclease domain; Cho-CTD, C-terminal domain found in Cho and Cho-related proteins; EXOIII, exonuclease domain in the α and ε subunits of DNA-polymerase; UNKNOWN, different conserved domains of unknown function; SOH1, component of the RNA polymerase II transcription complex in S. cerevisiae; N-MTase, predicted DNA or RNA or protein MTase acting on exocyclic amino groups in bases or amino acids.; HsdR, restriction subunit of a putative Type I RM system (the GIY-YIG domain is inserted at position ~800); Numod1-3, conserved DNA-binding domains of homing endonucleases; HTH, Helix-turn-helix; wHTH, winged-helix-turn-helix; COG3860. SF1 Helicase, putative Superfamily 1 helicase domain.
Figure 2
Figure 2
Multiple sequence alignment of 61 selected representatives of the GIY-YIG superfamily. Sequences were selected from each family (UvrC, Cho, Cho+Exo, Cho-like, Cho-like+Exo, Bacillus-1, Bacillus-2, HEases, REases, Penelope, COG3680, COG1833, Slx, MutS-like) to cover diversity of known structures and functions. Sequences are denoted by the species' name, the NCBI gene identification (GI) number and the PDB code (if applicable). Additionally sequences are grouped by families listed above. The variable termini and insertions are not shown; the number of omitted residues is indicated in parentheses. Amino acids are colored according to the physico-chemical properties of their side-chains (negatively charged: red, positively charged: blue, polar: magenta, hydrophobic: green). Conserved residues are highlighted. Secondary structure elements determined for the archaetypal member of the superfamily, I-TevI, are shown as H (helices) and E (strands).
Figure 3
Figure 3
Two-dimensional projection of the CLANS clustering results obtained for the full-length GIY-YIG sequences.
Figure 4
Figure 4
Two-dimensional projection of the CLANS clustering results obtained for the GIY-YIG domains isolated from sequences clustered in Figure 3.
Figure 5
Figure 5
Two-dimensional projection of the CLANS clustering results obtained for the full-length sequences of the "supercluster" Sequences were taken from central "supercluster" in Figures 3 and 4. Proposed subfamilies are colored and labeled: HEases – blue, orthodox UvrC (with EndoV domain) – green, orthodox Cho – magenta Cho-like+Exo domain – light pink, Cho+Exo domain – cyan, Bacillus-1 and Bacillus-2 – red. Additional labels: PBCV-1 virus and Chilo iridescent virus – yellow, Tlr8 from Tetrahymena thermophila – black
Figure 6
Figure 6
Distribution of GIY-YIG nucleases from different subfamilies among the three Domains of Life. Empty box indicates the presence of at least one family member in the corresponding taxon. Filled box indicates the presence of family members in >50% of fully sequenced genomes from the corresponding taxon. Abbreviations are: UV = UvrC, Ch = orthodox Cho, Ch+ = orthodox Cho+ExoIII, CL = Cho-like and Cho-like+ExoIII, HO = Heases, RE = REases, P = Penelope, C = COG3680, C1 = COG1833, SX = Slx1.
Figure 7
Figure 7
The postulated phylogenetic tree of the GIY-YIG superfamily. Only the major branches corresponding to subfamilies delineated in this work are shown. Colored blocks describe typicall domain architecture of corresponding family (the same as in Figure 1, however domain names are not shown). Blue, red, and green lines indicate bacterial, archaeal, and eukaryotic lineages. Dotted lines labeled 'HGT' indicate horizontal gene transfer events between different lineages. Dotted ellipses indicate the approximate time of intragenic duplications or other cases of horizontal gene transfer.
Figure 8
Figure 8
Comparision between three-dimensional organization of GIY-YIG domains. Structures of I-TevI (1ln0), UvrC (1ycz), Slx-1 (1ywl) and a domain of RNase H1 from Saccharomyces cerevisiae (1qhk) are shown in the cartoon representation, colored as a rainbow from the N-terminus (blue) to the C-terminus (red). The characteristic conserved Tyr residues from the GIY-YIG motif and the C-terminal Asn residue conserved in the UvrC-like lineage are shown as sticks.

Similar articles

Cited by

References

    1. Kowalski JC, Belfort M, Stapleton MA, Holpert M, Dansereau JT, Pietrokovski S, Baxter SM, Derbyshire V. Configuration of the catalytic GIY-YIG domain of intron endonuclease I-TevI: coincidence of computational and molecular findings. Nucleic Acids Res. 1999;27:2115–2125. - PMC - PubMed
    1. Belfort M, Reaban ME, Coetzee T, Dalgaard JZ. Prokaryotic introns and inteins: a panoply of form and function. J Bacteriol. 1995;177:3897–3903. - PMC - PubMed
    1. Gimble FS. Invasion of a multitude of genetic niches by mobile endonuclease genes. FEMS Microbiol Lett. 2000;185:99–107. - PubMed
    1. Stoddard BL. Homing endonuclease structure and function. Q Rev Biophys. 2005:1–47. - PubMed
    1. Van Roey P, Waddling CA, Fox KM, Belfort M, Derbyshire V. Intertwined structure of the DNA-binding domain of intron endonuclease I-TevI with its substrate. Embo J. 2001;20:3631–3637. - PMC - PubMed

Publication types

LinkOut - more resources