Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan 24:9:4.
doi: 10.1186/s13100-018-0111-x. eCollection 2018.

The dynamic intein landscape of eukaryotes

Affiliations

The dynamic intein landscape of eukaryotes

Cathleen M Green et al. Mob DNA. .

Abstract

Background: Inteins are mobile, self-splicing sequences that interrupt proteins and occur across all three domains of life. Scrutiny of the intein landscape in prokaryotes led to the hypothesis that some inteins are functionally important. Our focus shifts to eukaryotic inteins to assess their diversity, distribution, and dissemination, with the aim to comprehensively evaluate the eukaryotic intein landscape, understand intein maintenance, and dissect evolutionary relationships.

Results: This bioinformatics study reveals that eukaryotic inteins are scarce, but present in nuclear genomes of fungi, chloroplast genomes of algae, and within some eukaryotic viruses. There is a preponderance of inteins in several fungal pathogens of humans and plants. Inteins are pervasive in certain proteins, including the nuclear RNA splicing factor, Prp8, and the chloroplast DNA helicase, DnaB. We find that eukaryotic inteins frequently localize to unstructured loops of the host protein, often at highly conserved sites. More broadly, a sequence similarity network analysis of all eukaryotic inteins uncovered several routes of intein mobility. Some eukaryotic inteins appear to have been acquired through horizontal transfer with dsDNA viruses, yet other inteins are spread through intragenomic transfer. Remarkably, endosymbiosis can explain patterns of DnaB intein inheritance across several algal phyla, a novel mechanism for intein acquisition and distribution.

Conclusions: Overall, an intriguing picture emerges for how the eukaryotic intein landscape arose, with many evolutionary forces having contributed to its current state. Our collective results provide a framework for exploring inteins as novel regulatory elements and innovative drug targets.

Keywords: Endosymbiosis; Horizontal transfer; Intein; Mobile elements; Sequence similarity network.

PubMed Disclaimer

Conflict of interest statement

Not applicable.Not applicable.The authors declare that they have no competing interests.Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Types of self-splicing protein sequences and their distribution in eukaryotes. a Inteins, Hedgehog, and Hint proteins. Inteins are mobile, self-splicing protein elements present across eukarya, bacteria, and archaea. Conserved residues coordinate self-splicing, indicated by red arrows, to ligate the N-extein (blue) and C-extein (green). Hedgehog proteins are found in higher eukaryotes only and are involved in complex developmental processes. They are composed of two domains, HhN and HhC. The HhC domain is analogous to inteins, utilizing a similar mechanism to link cholesterol to HhN (red arrows). Hedgehog-intein (Hint) domains have cleavage properties similar to either the N-terminus (HintN-like) or C-terminus (HintC-like) of an intein, and are found in both metazoans and lower eukaryotes. b A modified phylogenetic tree of eukaryotes was constructed. Scaled circles indicate intein-containing phyla. In fungi, inteins are found in nuclear DNA (nDNA; red circles), in algae in chloroplast DNA (cpDNA; green circles), and in eukaryotic viruses (vDNA; blue circles). Specific intein-containing species are mentioned in the text. Total inteins in each tree are listed
Fig. 2
Fig. 2
Intein preponderance in pathogenic fungi. a Analysis of inteins in pathogens. Two phyla (Ascomycota + Basidiomycota) were analyzed for a propensity of inteins in pathogens (left). Sequenced genomes of non-pathogenic fungi (gray circles) and pathogenic fungi (black circles) were separated and overlaid with the number of intein-positive genomes from each group (red circles). The overall percentage of intein-containing pathogens is 37.0%, higher than the 16.4% of intein-containing non-pathogens. Ascomycota and Basidiomycota were analyzed separately (right), and also show higher number of intein-containing pathogens (41.8% and 18.6% compared to 27.4% and 3.2%, respectively). Out of available sequenced Ascomycota and Basidiomycota, there are more non-pathogenic genomes sequenced than pathogenic, indicating no sequencing bias. Total genomes analyzed are listed. b Certain fungal lineages have intein-pathogen correlation. Species within an individual phylum (Aspergillus/ascomycete and Cryptococcocus/basidiomycete) were analyzed for a correlation of inteins in pathogens. A condensed phylogenetic tree for Aspergillus species was constructed and annotated by lifestyle (colored circles). Presence of an intein is indicated by bold and red text. While Aspergillus contains many inteins, these do not have a preference for pathogenic species, with a negative correlation coefficient (r = − 0.2). The phylogenetic tree for Cryptococcus shows an absolute correlation (r = 1.0), with the only two known pathogens both having inteins
Fig. 3
Fig. 3
Intein-containing proteins are distinct between nDNA, cpDNA, and vDNA and fall into functional categories. Modified phylogenetic trees of intein-positive species in a fungi, choanoflagellates, amoebozoa, and apusozoa, b algae and seaweeds, and c eukaryotic viruses are presented. The heat maps correspond to the tree and show inteins present in nDNA (red), cpDNA (green), and vDNA (blue). The nDNA inteins are mostly in fungi, overwhelmingly in Prp8, VMA1 and DdRP. One Prp8 intein is found in green algae in nDNA. The cpDNA inteins are in DnaB and ClpP, but are also found in DdRP. The vDNA inteins are present in DdDP, DdRP, HEL, and RIR proteins, but no intein overlap is observed between virus and virus host. Black bars show the number of intein positive genomes relative to the number of sequenced genomes in the phylogenetic category. Extein abbreviations are as follows: Prp8 – pre-mRNA processing factor 8; VMA1 – vacuolar membrane ATPase; DdRP –DNA-directed RNA polymerase; ThrRS – threonyl tRNA synthetase; CHS – chitin synthase; GLT – glutamate synthase; ClpP – ATP-dependent Clp protease, proteolytic subunit; DnaB – DNA helicase; DdDP – DNA-directed DNA polymerase; RIR – ribonucleotide reductase; HEL – helicase. d Orthologous group analysis for nDNA, cpDNA, and vDNA classifies intein-containing exteins to functional categories (Additional file 1: Tables S8-S10). nDNA inteins are biased towards category A, RNA processing, from insertions in Prp8. cpDNA and vDNA inteins have bias towards catergory L, or proteins with replication, recombination and repair functions. Functional categories are as follows: A – RNA processing and modification; C – energy production and conversion; F – nucleotide transport and metabolism; K – transcription; L – replication, recombination, and repair
Fig. 4
Fig. 4
Eukaryotic inteins insert at conserved, structurally flexible regions of host proteins. Intein-containing proteins with PDB structures (Prp8 – 5GMK, VMA1 – 3J9T, ThrRS – 3UGQ, and GLT – 1EA0) were selected to build ConSurf maps, which indicate the degree of conservation after structural alignment. Mauve indicates highly conserved, whereas cyan is more variable as shown in the key. The first residue of the C-extein, shown as spheres, is highlighted in yellow and indicates intein insertion site. Prp8i-a, VMA1i-a, and GLTi-a are in structurally flexible, yet highly conserved sites. The ThrRS intein is inserted in a structured α-helix. Linear cartoons were also generated using Pro-Origami and are shown above the ConSurf maps. Residue numbering indicates the region of protein used in the Pro-Origami model. The black arrow shows intein insertion site and the number corresponds to the highlighted residue in the ConSurf structure. Structure representations are as follows: α-helix – gray rectangle, β-strand – gray arrow, flexible boundary - black line
Fig. 5
Fig. 5
Eukaryotic inteins vary greatly in size. a Diversity of eukaryotic intein types. Inteins are classified into three types: HEN(−), HEN(+), or HEN(+)extra. HEN(−) inteins contain the four conserved splicing blocks (A, B, F, and G) [50, 79]. Some have linker sequences between block B and F, such as the C. gattii Prp8 intein. HEN(+) inteins are full-length, and additionally encode blocks C, D, E, and H for the LAGLIDADG HEN domain. The HEN(+)extra inteins are large, rarely described inteins that have stretches of linker domains or repeat sequences of unknown function. The only examples of HEN(+)extra inteins in eukaryotes are in Prp8. b All eukaryal inteins in nDNA (red), cpDNA (green), or vDNA (blue) ordered by residue length (totaling 393 inteins). The nDNA inteins show the greatest size diversity, having HEN(−), HEN(+), and the only inteins in the HEN(+)extra category. cpDNA inteins are overwhelmingly HEN(−). vDNA inteins fall between the sizes of nDNA and cpDNA inteins mainly in the range of HEN(+). c Inteins in specific exteins cluster by size. When inteins from specific exteins are plotted as a function of residue length, most cluster in the same HEN category, e.g. VMA1i are all HEN(+). Prp8i are the major exception, where inteins range across all three HEN types. Extein abbreviations are listed in Fig. 3 legend
Fig. 6
Fig. 6
Sequence similarity network reveals inteins cluster by exteins and shows dynamic movement. a Eukaryotic intein clustering. The eukaryotic intein network shows relationships between nDNA (red), cpDNA (green), and vDNA (blue) inteins. Network indicates the presence of multiple intein lineages, which mostly correspond to clustering by exteins. Clear examples are Prp8i (1a, 1b) and VMA1i (2), and many viral inteins also cluster by exteins (7, 8, 9, 10). Cases where this pattern is broken represent possible horizontal transfer events (1b, 1c, 3, 6). Some inteins cluster phylogenetically, such as DnaBi from Rhodophyta (5a) or Heterokonta (5b). Hedgehog proteins (black; HhC) do not cluster with any eukaryotic inteins (11), indicating no phylogenetic relationship between Hedgehog and inteins based on sequence, although they are structurally and functionally similar. Hint-containing mating type switching proteins (yellow, HO Hop) cluster with VMA1i (2). Some inteins do not form connections to anything at all (12). b Nuclear intragenomic intein transfer. Selected intein pairs were further examined by calculating pairwise similarity percentages and are shown in the box plot. The GLTi and CHSi pair shows an average similarity above 50%, indicative of intragenomic transfer. c Endosymbiotic intein transfer. A phylonetwork tree was built in SplitsTree after alignment of cpDNA DnaBi (green) and bacterial DnaBi (pink). A branch of clustering of cpDNA DnaBi and bacterial DnaBi (shaded) suggests that DnaBi in chloroplasts might have been inherited from a cyanobacterial progenitor. Since bacteria also have inteins in ClpP, cpDNA ClpPi were included as a control and they cluster separately (6)
Fig. 7
Fig. 7
Model for eukaryotic intein distribution and dissemination. Some nuclear inteins present in fungi were likely present in the Last Universal Common Ancestor (LUCA) (1), consistent with intein distribution across all three domains of life. Examples of intragenomic transfer of inteins were also found in both nuclei and chloroplasts (2). The DnaBi within chloroplasts appear to be from reticulate evolution via endosymbiosis (3). Other inteins in fungal cell nuclei and within algae are spread by horizontal gene transfer through eukaryotic viruses that replicate in the cytoplasm (4)

References

    1. Saleh L, Perler FB. Protein splicing in cis and in trans. Chem Rec. 2006;6:183–193. doi: 10.1002/tcr.20082. - DOI - PubMed
    1. Novikova O, Topilina N, Belfort M. Enigmatic distribution, evolution, and function of inteins. J Biol Chem. 2014;289:14490–14497. doi: 10.1074/jbc.R114.548255. - DOI - PMC - PubMed
    1. Gimble FS, Thorner J. Homing of a DNA endonuclease gene by meiotic gene conversion in Saccharomyces cerevisiae. Nature. 1992;357:301–306. doi: 10.1038/357301a0. - DOI - PubMed
    1. Liu XQ. Protein-splicing intein: genetic mobility, origin, and evolution. Annu Rev Genet. 2000;34:61–76. doi: 10.1146/annurev.genet.34.1.61. - DOI - PubMed
    1. Hall TM, Porter JA, Young KE, Koonin EV, Beachy PA, Leahy DJ. Crystal structure of a hedgehog autoprocessing domain: homology between hedgehog and self-splicing proteins. Cell. 1997;91:85–97. doi: 10.1016/S0092-8674(01)80011-8. - DOI - PubMed

LinkOut - more resources