Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015;16 Suppl 14(Suppl 14):S8.
doi: 10.1186/1471-2105-16-S14-S8. Epub 2015 Oct 2.

Event inference in multidomain families with phylogenetic reconciliation

Event inference in multidomain families with phylogenetic reconciliation

Maureen Stolzer et al. BMC Bioinformatics. 2015.

Abstract

Background: Reconstructing evolution provides valuable insights into the processes of gene evolution and function. However, while there have been great advances in algorithms and software to reconstruct the history of gene families, these tools do not model the domain shuffling events (domain duplication, insertion, transfer, and deletion) that drive the evolution of multidomain protein families. Protein evolution through domain shuffling events allows for rapid exploration of functions by introducing new combinations of existing folds. This powerful mechanism was key to some significant evolutionary innovations, such as multicellularity and the vertebrate immune system. A method for reconstructing this important evolutionary process is urgently needed.

Results: Here, we introduce a novel, event-based framework for studying multidomain evolution by reconciling a domain tree with a gene tree, with additional information provided by the species tree. In the context of this framework, we present the first reconciliation algorithms to infer domain shuffling events, while addressing the challenges inherent in the inference of evolution across three levels of organization.

Conclusions: We apply these methods to the evolution of domains in the Membrane associated Guanylate Kinase family. These case studies reveal a more vivid and detailed evolutionary history than previously provided. Our algorithms have been implemented in software, freely available at http://www.cs.cmu.edu/˜durand/Notung.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic of multidomain evolution. (a) A hypothetical multidomain family evolving by gene duplication and domain shuffling. (b) Trees representing the history for each domain in the gene family. (c) The evolutionary history of the same family showing the domains evolving in the gene tree. Reconciliation correctly infers 2 domain duplications, 1 domain transfer and 1 domain loss. The gene family (locus) tree is shown in brown; black squares represent domain duplications.
Figure 2
Figure 2
Domain architecture gain/loss model. Ancestral domain architectures for the hypothetical family in Fig. 1. Wagner parsimony applied to the the DA model infers 3 gains and 1 loss, underestimating the true events in the "known" history. The ancestral domain architectures inferred with Wagner parsimony are also incorrect.
Figure 3
Figure 3
Evolution of a hypothetical multidomain gene family. Domain instances are represented by grey squares. Duplicated genes in the same genome are connected by dotted lines.
Figure 4
Figure 4
Domain insertions in the presence of gene loss. (a) Embedded trees showing the co-evolution of domains, genes, and species in the hypothetical family in Fig. 3. (b) An extended reconciled gene tree for the gene family. Inferred gene losses (ℓB and ℓC ) and pseudonodes (open circles, ϕR and ϕE ) representing the location of the missing taxa in the gene tree are shown in grey. The pseudonodes are used to distinguish between gene losses and domain losses in the reconciled domain tree. Gene duplication is represented as a black square; filled circles represent co-divergences. (c) A reconciled tree for the domain family, showing a domain insertion (arrow, edge (u, v)) and a domain loss (d2 g2 B). Domains that are missing due to gene loss (ℓB and ℓC) are shown in grey. Co-divergence due to gene duplication is represented by an open square.
Figure 5
Figure 5
Multidomain Maguk gene family. (a) Domain architectures found in the Maguk family. Note that DLG5 is a member of the ZO subfamily [53]. (b) Model of a scaffolding complex with two interacting Maguk proteins, adapted from [54].
Figure 6
Figure 6
Phylogenetic relationships of Maguk subfamilies. Maximum likelihood phylogeny of GuK domain sequences. Clades containing paralogous genes from the same subfamily are collapsed. Edge weights are the number of bootstrap replicates, out of 100, supporting that edge.
Figure 7
Figure 7
History of L27 domain shuffling. (a) Phylogeny of the MPP subfamily, based on the GuK domain maximum likelihood tree. (b) The L27-1 domain phylogeny showing the inferred domain insertion event. (c) Reconstruction of the evolution of the MPP domain architecture, showing the replacement of the N-terminal L27 domain in the common ancestor of MPP2 and MPP6. Clades containing mouse, human, and chicken orthologs are collapsed.
Figure 8
Figure 8
Consensus network for the L27-1 confidence set.
Figure 9
Figure 9
Maguk PDZ event history. Based on reconciled domain tree 84, shown in Fig. S2 (in Additional File 1). Leaf labels correspond to protein name, followed by domain name. Domains in each protein are numbered in N- to C-terminal order. Clades containing mouse, human, and chicken orthologs are collapsed, except in the Carma family, in which orthologous sequences are not monophyletic. Clades of paralogous genes from the same subfamily are also collapsed. All insertion events that are not within collapsed clades are shown (yellow arrows, annotated with event support values), including seven of the nine high-scoring events. Domain losses are indicated in gray. Gene losses not shown.
Figure 10
Figure 10
Reconstruction of PDZ domain shuffling in the ZO subfamily. Insertion of the two N-terminal PDZ domains in the common ancestor of ZO2 and ZO3, followed by the insertion of the same two domains, plus the ancestral domain, from ZO2 to ZO1. The ancestral architectures and domain insertions superimposed on the gene tree were reconstructed manually using events and associations of the ancestral nodes in the reconciled domain, gene, and species trees.
Figure 11
Figure 11
Reconstruction of PDZ domain shuffling in the Carma subfamily. Insertion of the PDZ domains from MPP4 and Carma1 into the ancestral Carma3. The two copies in Carma3 subsequently experienced reciprocal loss - one domain was lost in the mammal ancestor and the other was lost in chicken. Carma2 also experienced a gene loss in chicken. The ancestral architectures and domain insertions superimposed on the gene tree were reconstructed manually using events and associations of the ancestral nodes in the reconciled domain, gene, and species trees.

References

    1. Moore A, Björklund A, Ekman D, Bornberg-Bauer E, Elofsson A. Arrangements in the modular evolution of proteins. Trends Biochem Sci. 2008;33(9):444–451. - PubMed
    1. Buljan M, Frankish A, Bateman A. Quantifying the mechanisms of domain gain in animal proteins. Genome Biol. 2010;11(7):74. - PMC - PubMed
    1. Basu M, Poliakov E, Rogozin I. Domain mobility in proteins: functional and evolutionary implications. Brief Bioinform. 2009;10(3):205–216. - PMC - PubMed
    1. Chothia C, Gough J. Genomic and structural aspects of protein evolution. Biochem J. 2009;419(1):15–28. - PubMed
    1. Finn R, Mistry J, Tate J, Coggill P, Heger A. et al.The Pfam protein families database. Nucleic Acids Res. 2010;38:211–222. - PMC - PubMed

Publication types

Substances

LinkOut - more resources