Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 May;37(8):2560-73.
doi: 10.1093/nar/gkp095. Epub 2009 Mar 5.

Fractured genes: a novel genomic arrangement involving new split inteins and a new homing endonuclease family

Affiliations

Fractured genes: a novel genomic arrangement involving new split inteins and a new homing endonuclease family

Bareket Dassa et al. Nucleic Acids Res. 2009 May.

Abstract

Inteins are genetic elements, inserted in-frame into protein-coding genes, whose products catalyze their removal from the protein precursor via a protein-splicing reaction. Intein domains can be split into two fragments and still ligate their flanks by a trans-protein-splicing reaction. A bioinformatic analysis of environmental metagenomic data revealed 26 different loci with a novel genomic arrangement. In each locus, a conserved enzyme coding region is broken in two by a split intein, with a free-standing endonuclease gene inserted in between. Eight types of DNA synthesis and repair enzymes have this 'fractured' organization. The new types of naturally split-inteins were analyzed in comparison to known split-inteins. Some loci include apparent gene control elements brought in with the endonuclease gene. A newly predicted homing endonuclease family, related to very-short patch repair (Vsr) endonucleases, was found in half of the loci. These putative homing endonucleases also appear in group-I introns, and as stand-alone inserts in the absence of surrounding intervening sequences. The new fractured genes organization appears to be present mainly in phage, shows how endonucleases can integrate into inteins, and may represent a missing link in the evolution of gene breaking in general, and in the creation of split-inteins in particular.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schematic representation of fractured gene arrangements. Genomic arrangement of 27 loci, assembled from GOS reads, and grouped according to the type of the enzyme host. Protein coding regions are shown as rectangles, with the enzyme hosts in green, split-intein parts in blue, and endonucleases in red. N-terminal intein half (In) and C-terminal intein half (Ic); free-standing homing endonuclease domains: GIY-YIG (GIY), very short repeat like (Vsr). Abbreviations for other gene names are specified in the text. Possible hairpin structures are marked as vertical lines on the 5′ untranslated regions of endonuclease genes. Coding frames overlap are marked by offset of overlapping coding regions.
Figure 2.
Figure 2.
Sequence features of the new split-inteins. Multiple sequence alignment of N-terminal (A) and C-terminal (B) halves of full-length split-inteins. Conserved motifs of the HINT protein-splicing family are boxed and labeled, and active-site residues are marked with an asterisk. The sequence alignment was refined based on structural modeling with the cyanobacterial DnaE split-intein as a template (PDB code: 1ZD7). Sequences are named after their protein host. (C) Electrostatic characteristics of full-length split-inteins. The number of salt-bridges was calculated from the modeled tertiary structures, and is indicated together with the local charges of the two interacting beta-strands of the intein-halves. (D) Illustration of salt-bridges at the interaction interface between the N-terminal (blue) and C-terminal (red) halves, in the modeled structure of gp41-1. The two longest anti-parallel beta-strands of the intein molecule are shown in cartoon representation.
Figure 3.
Figure 3.
Sequence features of the Vsr-like putative homing endonuclease family and its similarity to Vsr repair endonucleases. (A) Conserved sequence motifs of Vsr-like putative homing endonucleases, and their sequence logos. (B) Structure based alignment of DnaE-2 locus Vsr-like putative homing endonuclease with E.coli Vsr repair endonuclease (Vsr; PDB code 1CW0). Residues that were modeled in similar positions and backbone conformations after sequence threading and energy minimization are shown in upper case; unaligned sequence regions are shown in grey lowercase. Identical residues are highlighted in red. Vsr active site residues are marked by asterisks, DNA binding residues are marked by ‘d's, and Zinc binding residues are marked by ‘z's. The secondary structure of Vsr is shown above its sequence. Conserved sequence motifs of DnaE-2 are marked as in A. The Phyre server Z-score for this alignment was 5 × 10−14. (C) Predicted structure of DnaE-2 locus Vsr-like protein positions 196–315, and its similarity to structure of Vsr (D) Motif to motif alignment of the Vsr-like putative homing endonucleases (top) with Vsr repair endonuclease (bottom). Alignments of the second and third (rightmost) aligned blocks are significant, expected to occur by chance 2e−2 and <5e−7. The first aligned block has a non-significant score since it is shorter and less conserved. Nevertheless, the alignment is probably genuine since the corresponding Vsr and Vsr-like regions were also found aligned in sequence to sequence, sequence to multiple alignment, and structure threading alignments. Functional residues of the Vsr endonuclease are marked as in B.
Figure 4.
Figure 4.
Nucleotide features of endonuclease genes. (A) RNA hairpin structures at the 5′ untranslated region of endonuclease ORFs in the gp41-1 (representing the very similar sequences of gp41-1–7), nrdA-5 and DnaE-1 gene loci. Initiator codons are marked by arrows, conserved putative T4 late promoter elements are boxed, and conserved sequence motifs (Supplementary Figure S4) are highlighted in grey. The expected values for motifs 1 and 2 are 1.7−10 and 9.9−3, respectively. RNA structures were calculated using the Vienna package (http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi), and sequence motifs were identified using the MEME program. (B) Overlapping#protein coding regions of endonuclease 3′ termini and the 5′ termini of their downstream genes.
Figure 5.
Figure 5.
A model for gene breaking by intein-targeted homing endonuclease invasions. Observed genomic organizations are numbered in Arabic numbers, and putative evolutionary processes are marked with arrows and Roman numbers. Protein coding genes can be invaded by an in-frame intein domain (I). Invasion of the intein domain gene by an endonuclease gene can fracture the gene in two (II), or insert the endonuclease gene in frame within the intein (IIa). Invasion of an endonuclease gene into an intein-less gene can fracture that gene (Ia). Such an invasion can also insert the endonuclease gene into a group I intron (Ib). Loss of the endonuclease gene from the split-intein locus (III) may be followed by dislocation of the two fractured gene parts into two separate loci (IV). Loss of the flanks of an endonuclease gene in a fractured gene locus could insert the endonuclease coding region in-frame into the intein, recreating a contiguous gene (IIIa).

Similar articles

Cited by

References

    1. Haugen P, Simon DM, Bhattacharya D. The natural history of group I introns. Trends Genet. 2005;21:111–119. - PubMed
    1. Perler FB, Davis EO, Dean GE, Gimble FS, Jack WE, Neff N, Noren CJ, Thorner J, Belfort M. Protein splicing elements: inteins and exteins – a definition of terms and recommended nomenclature. Nucleic Acids Res. 1994;22:1125–1127. - PMC - PubMed
    1. Dassa B, Haviv H, Amitai G, Pietrokovski S. Protein splicing and auto-cleavage of bacterial intein-like domains lacking a C′-flanking nucleophilic residue. J. Biol. Chem. 2004;279:32001–32007. - PubMed
    1. Dassa B, Yanai I, Pietrokovski S. New type of polyubiquitin-like genes with intein-like autoprocessing domains. Trends Genet. 2004;20:538–542. - PubMed
    1. Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. Detecting protein function and protein-protein interactions from genome sequences. Science. 1999;285:751–753. - PubMed

Publication types