Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010;11(7):R74.
doi: 10.1186/gb-2010-11-7-r74. Epub 2010 Jul 15.

Quantifying the mechanisms of domain gain in animal proteins

Affiliations

Quantifying the mechanisms of domain gain in animal proteins

Marija Buljan et al. Genome Biol. 2010.

Abstract

Background: Protein domains are protein regions that are shared among different proteins and are frequently functionally and structurally independent from the rest of the protein. Novel domain combinations have a major role in evolutionary innovation. However, the relative contributions of the different molecular mechanisms that underlie domain gains in animals are still unknown. By using animal gene phylogenies we were able to identify a set of high confidence domain gain events and by looking at their coding DNA investigate the causative mechanisms.

Results: Here we show that the major mechanism for gains of new domains in metazoan proteins is likely to be gene fusion through joining of exons from adjacent genes, possibly mediated by non-allelic homologous recombination. Retroposition and insertion of exons into ancestral introns through intronic recombination are, in contrast to previous expectations, only minor contributors to domain gains and have accounted for less than 1% and 10% of high confidence domain gain events, respectively. Additionally, exonization of previously non-coding regions appears to be an important mechanism for addition of disordered segments to proteins. We observe that gene duplication has preceded domain gain in at least 80% of the gain events.

Conclusions: The interplay of gene duplication and domain gain demonstrates an important mechanism for fast neofunctionalization of genes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Summary of mechanisms for domain gains. This figure shows potential mechanisms leading to domain gains and the signals that can be used to detect the causative mechanism. Domain gain by retroposition is illustrated as an example where the domain is transcribed together with the upstream long interspersed nuclear element (LINE), but other means of retroposition are also possible [3]. The list of possible mechanisms is not exhaustive and other scenarios can occur, such as, for example, exonization of previously non-coding sequence or gain of a viral or transposon domain during retroelement replication. IR, illegitimate recombination; NAHR, non-allelic homologous recombination.
Figure 2
Figure 2
Distribution of domain gain events according to the position of domain insertion and number of exons gained. Gains at amino and carboxyl termini and in the middle of proteins are shown separately. The first column in each group shows the fraction of gains where the gained domain is coded by multiple new exons and the second where it is coded by a single new exon. The third column shows the fraction of gains where the ancestral exon has been extended and the gained domain is coded by the extended exon as well as by additional exons. Finally, the fourth column in each group shows cases where only the ancestral exon has been extended with the sequence of a new domain.
Figure 3
Figure 3
Distribution of disordered residues in the gained domains according to the position of domain insertion and number of exons gained. This graph shows the percentage of disordered residues in each category of domain gains. The fraction of events in each category can be seen in Figure 2.
Figure 4
Figure 4
Examples of evidence for mechanisms that have caused domain gains. (a) An example of a domain gain mediated by retroposition. TreeFam family TF352220 contains genes with a transposase domain (PF01359). The primate transcripts in this family have been extended at their amino terminus with the pre-SET and SET domains. The representative transcript for this gain event is SETMAR-201 (ENST00000307483; left-hand side). Both gained domains have a significant hit in the gene SUV39H1 (ENSG00000101945; right-hand side) - the Set domains of the donor and recipient proteins share 41% identity. Previously, it has been reported that the chimeric gene originated in primates by insertion of the transposase domain (PF01359, with mutated active site and no transposase activity) in the gene that contained the pre-SET and SET domains [21]. Here we propose that the evolution of this gene involved two crucial steps: retroposition of the sequence coding for the pre-SET and SET domains and the already described insertion of the MAR transposase region [21]. The SET domain has lost the introns present in the original sequence and the pre-SET domain has an intron containing repeat elements in a position not present in the original domain, suggesting it was inserted later on. The likely evolutionary scenario here includes duplication of pre-SET and SET domains through retroposition, insertion of the transposase domain and subsequent joining of these domains. The SETMAR gene is in the intron of another gene (SUMF1), which is on the opposite strand, so it might be that SETMAR is using the other gene's regulatory regions for its transcription. The top of the figure shows the genomic positions of depicted genes. Arrowheads on the lines that represent chromosomal sequences indicate whether the transcripts are coded by the forward or reverse strand. Transcripts are always shown in the 5' to 3' orientation and proteins in the amino- to carboxy-terminal orientation. Exon projections and intron phases are also shown on the protein level. Pfam domains are illustrated as colored boxes. Figure 4b and Additional file 8 use the same conventions. (b) An example of a domain gain by gene duplication followed by exon joining. TreeFam family TF314963 contains genes with a lactate/malate dehydrogenase domain where one branch with vertebrate genes has gained the additional UEV domain. Homologues, both orthologues and paralogues, without the gained domains are present in a number of animal genomes. A representative transcript with the gained domain is UEVLD-205 (ENST00000396197; left-hand side). The UEV domain in that transcript is 56% identical to the UEV domain in the transcript TSG101-201 (ENST00000251968), which belongs to the neighboring gene TSG101, and the two transcripts also have introns with identical phases in the same positions. The likely scenario is that after the gene coding for the TSG101-201 transcript was duplicated, its exons were joined with those of the UEVLD-205 ancestor and the two genes have been fused.
Figure 5
Figure 5
Chromosomal position of the 'donor gene' and the relative age of the gain event. The graph shows the fraction of events for which the 'donor gene' of the gained domain is identified, and is on the same chromosome as the gene with the gained domain, with respect to the relative age of the gain event. The gain events were divided into five groups according to the expected age of the event as judged by the TreeFam phylogeny. The x-axis shows the evolutionary group in the human lineage to which descendants of the gain event belong, and the y-axis shows the percentage of gain events in each evolutionary group for which both of the conditions were valid: we were able to find the donor gene and the donor gene was on the same chromosome as the gene with the gained domain. This was true for 3 out of 9 gain events in primates, 2 out of 20 in mammals, 7 out of 121 in vertebrates, 1 out of 27 in Bilateria and 1 out of 55 in all animals. Estimated divergence times (in millions of years ago (mya), as taken from Ponting [80]) are: 25 mya for primates, 166 mya for mammals, 416 mya for vertebrates and 700 mya for all animals (we were not able to estimate the divergence time for Coelomata).

References

    1. Marsden RL, McGuffin LJ, Jones DT. Rapid protein domain assignment from amino acid sequence using predicted secondary structure. Protein Sci. 2002;11:2814–2824. doi: 10.1110/ps.0209902. - DOI - PMC - PubMed
    1. Chothia C, Gough J, Vogel C, Teichmann SA. Evolution of the protein repertoire. Science. 2003;300:1701–1703. doi: 10.1126/science.1085371. - DOI - PubMed
    1. Babushok DV, Ostertag EM, Kazazian HH Jr. Current topics in genome evolution: molecular mechanisms of new gene formation. Cell Mol Life Sci. 2007;64:542–554. doi: 10.1007/s00018-006-6453-4. - DOI - PMC - PubMed
    1. Rubin GM, Yandell MD, Wortman JR, Gabor Miklos GL, Nelson CR, Hariharan IK, Fortini ME, Li PW, Apweiler R, Fleischmann W, Cherry JM, Henikoff S, Skupski MP, Misra S, Ashburner M, Birney E, Boguski MS, Brody T, Brokstein P, Celniker SE, Chervitz SA, Coates D, Cravchik A, Gabrielian A, Galle RF, Gelbart WM, George RA, Goldstein LS, Gong F, Guan P. Comparative genomics of the eukaryotes. Science. 2000;287:2204–2215. doi: 10.1126/science.287.5461.2204. - DOI - PMC - PubMed
    1. Peisajovich SG, Garbarino JE, Wei P, Lim AW. Rapid diversification of cell signaling phenotypes by modular domain recombination. Science. 2010;328:368–372. doi: 10.1126/science.1182376. - DOI - PMC - PubMed

LinkOut - more resources