Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015 Nov 21:13:43.
doi: 10.1186/s12964-015-0120-z.

Short linear motifs - ex nihilo evolution of protein regulation

Affiliations
Review

Short linear motifs - ex nihilo evolution of protein regulation

Norman E Davey et al. Cell Commun Signal. .

Abstract

Short sequence motifs are ubiquitous across the three major types of biomolecules: hundreds of classes and thousands of instances of DNA regulatory elements, RNA motifs and protein short linear motifs (SLiMs) have been characterised. The increase in complexity of transcriptional, post-transcriptional and post-translational regulation in higher Eukaryotes has coincided with a significant expansion of motif use. But how did the eukaryotic cell acquire such a vast repertoire of motifs? In this review, we curate the available literature on protein motif evolution and discuss the evidence that suggests SLiMs can be acquired by mutations, insertions and deletions in disordered regions. We propose a mechanism of ex nihilo SLiM evolution - the evolution of a novel SLiM from "nothing" - adding a functional module to a previously non-functional region of protein sequence. In our model, hundreds of motif-binding domains in higher eukaryotic proteins connect simple motif specificities with useful functions to create a large functional motif space. Accessible peptides that match the specificity of these motif-binding domains are continuously created and destroyed by mutations in rapidly evolving disordered regions, creating a dynamic supply of new interactions that may have advantageous phenotypic novelty. This provides a reservoir of diversity to modify existing interaction networks. Evolutionary pressures will act on these motifs to retain beneficial instances. However, most will be lost on an evolutionary timescale as negative selection and genetic drift act on deleterious and neutral motifs respectively. In light of the parallels between the presented model and the evolution of motifs in the regulatory segments of genes and (pre-)mRNAs, we suggest our understanding of regulatory networks would benefit from the creation of a shared model describing the evolution of transcriptional, post-transcriptional and post-translational regulation.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Conservation of functionally important motifs and the proliferation of motifs through ex nihilo motif acquisition. a Alignment of the PCNA-binding PIP box motif of Flap endonuclease 1 (FEN1) showing the motif conservation spanning over 3 billion years of evolution across all Eukaryotes and Archaea (representative species - Thermococcus kodakaraensis) [24, 25, 108]. b An alignment of a representative selection of PxIxIT motif instances: Nuclear factor of activated T-cells, cytoplasmic 1 (NFATC1) [109], A-kinase anchor protein 5 (AKAP5) [110] and Potassium channel subfamily K member 18 (KCNK18) [111] from human; Phosphatidylinositol 4,5-bisphosphate-binding protein SLM1 (Slm1) [112], Protein HPH1 (Hph1) [113] and Transcriptional regulator CRZ1 (Crz1) from yeast [114]; and Ankyrin repeat domain-containing protein A238L from African swine fever virus (ASFV) [115]. Each motif instance occurs in a non-homologous protein (see panel c) and the most likely mode of acquisition for these functional modules is by ex nihilo evolution through random mutation. The alignment shows a clear preference for specific residues at a given position in the peptide with each position allowing a different level of degeneracy. These preferences reflect the preferences of the Calcineurin PxIxIT binding pocket (see panel d). c The modular architecture of the proteins from panel B showing the distinct organisation of the non-homologous proteins. Domains (grey), transmembrane regions (green) and PxIxITs (blue) are shown. Proteins are aligned around the PxIxIT instances. d Structure of the PxIxIT binding pocket of the human calcineurin catalytic A subunit bound to the PxIxIT of African swine fever virus A238L (PDB ID:4F0Z) [115]. The peptide binds by beta-augmentation and the defined residues at P1, P3, P5 sit in a conserved hydrophobic pocket explaining the strong preferences at these positions in known PxIxIT instances (light blue surface on the domain denotes hydrophobic residues) [109, 110, 116]
Fig. 2
Fig. 2
Examples of ex nihilo motif gain and motif loss. a The N-terminus of the SHOC2 contains an S2- > G mutation in multiple Noonan-like syndrome patients that “knocks in” an N-myristoylation motif [26]. Blue bold residues signify the specificity determining residues of the motif. b A PxIxIT calcineurin-docking motif in S. cerevisiae Serine/threonine-protein kinase ELM1 (Elm1) has likely evolved in the common ancestor of S. cerevisiae and S. paradoxus [27]. c A human-centric phylogeny of E3 ubiquitin-protein ligase Mdm2 (Mdm2). An RxL Cyclin docking motif was gained in the rodent Mdm2 proteins as a result of a four amino acid deletion (grey region) [117]. Green bold residues signify the position of the residues corresponding to the specificity determining residues of the motif before the SDSI deletion. d Example of motif loss contributing to functional divergence post-duplication. S. cerevisiae ohnologues Ace2 and Swi5 were both retained after the whole genome duplication (WGD) but have functionally diverged post duplication, in part, by the loss of a serine/threonine-protein kinase Cbk1 docking site and two Cbk1 phosphosites in the Swi5 lineage. A representative example of a single pre-WGD homologue in Lachancea waltii shows the modular architecture of the Ace2/Swi5 ancestor [36]. e Example of motif gain contributing to functional divergence post-duplication. The Cyclin A and Cyclin B regulatory subunits of the CDK family protein kinases share a common ancestor that contained a D box motif to recruit the APC/C E3 ubiquitin ligase promoting Cyclin destruction during mitosis. Post-duplication the Cyclin A lineage gained an ABBA motif allowing Cyclin A to be destroyed earlier than Cyclin B during prometaphase [40]. f The accumulation of the Nx[TS] glycosylation motifs in hemagglutinin of Influenza H3N2 over the last 40 years. The number of glycosylation motifs has increased from two to seven tuning the trade-off between host receptor binding and immune evasion [118]
Fig. 3
Fig. 3
The relationship between compact degenerate motifs, occurrence likelihoods and ex nihilo evolution. a The homeodomain of Drosophila Segmentation polarity homeobox protein engrailed (en) bound to a TAATTA subsite [119]. b The RRM of Transformer-2 protein homolog beta (TRA2B) bound to an AGAA exonic splicing enhancer (ESE) motif [120]. c The SH3 domain of Adapter molecule crk (CRK) bound to a PxxP motif from Rap guanine nucleotide exchange factor 1 (RAPGEF1) [121]. d The number of nucleotides or residues expected between instances of a motif occurring by chance in a sequence. A non-degenerate x-mer nucleotide motif instance would be expected to occur once every 4x nucleotides (e.g. a 6-mer every 46 or 4,096 nucleotides) and an non-degenerate x-mer protein motif would be expected to occur once every 20x amino acids (e.g. a 3-mer peptide motif every 203 or 8000 amino acids). The disparity in the length of the regions that contain these motifs (DNA, (pre-)mRNA and proteins) means that the number of random instances will vary by several fold across the three classes of biomolecule. Ranges are illustrative and are therefore approximate, based on over predictive consensuses (see motifs below) and use equal nucleotide (1/4) and amino acid (1/20) frequencies. Protein SLiMs: proline-directed phosphosite ([ST]P) [29]; D box degron (RxxLxx[ILMVK]) [69]; PxIxIT Calcineurin docking motif (Px[IVLF]x[IVLF][TSHEDQNKR]) [27]; SH3 domain-binding motif (PxxPx[KR]) [32]; PTAP late domain motif (P[TS]AP) [122]; and Fbw7 SCF degron([ILMVP]TPxx[ST]) [123]. RNA motif: A single RRM binding site (4 nucleotides) [124]; a single Zinc Finger recognition site (3 nucleotides) [125]; and an miRNA seed regions (6–8 nucleotides) [126]. DNA motifs: a single Zinc Finger recognition site (3 nucleotides) [127]; Homeobox domain (TAAT[GT][GT]) [128]; CAAT box ([TC]GATTGG[TC][TC][AG]) [129]; and P53 regulatory element (C[AT][AT]GNNNNNNC[AT][AT]G) [130]. e Simple model for motif acquisition by DNA, RNA and proteins (see text for details of model). f Potential mechanism of ex nihilo motif evolution illustrated using a hypothetical LxCxE pRB-binding motif (see text for details of model)
Fig. 4
Fig. 4
Examples of motif-binding pocket evolution. a Representative selection of motif-binding pockets in the WD40 repeat fold demonstrating the simplicity of motif-binding pocket birth. Each pocket has evolved independently and subsequently multiple proteins (representative examples listed) have acquired the motifs necessary to recruit the various WD40 repeat containing proteins. The figure includes: an ABBA motif (dark blue – consensus [ILV][FHY]x[DE]), a D box degron motif (red – consensus RxxLxx[ILVK]) and a KEN box degron motif (yellow – consensus KEN) from APC/C-CDH1 modulator 1 (Acm1) bound to the WD40 domain of the APC/C activator protein CDH1 (Cdh1) [69]; an Fbw7 degron motif (orange – consensus pTPxxpS) from Cyclin E bound to the WD40 domain of the F-box/WD repeat-containing protein 7 (FBW7) [123]; a β-TrCP1 degron motif (light blue – consensus DpSGxxpS) from β-Catenin bound to the WD40 domain of the F-box/WD repeat-containing protein 1A (BTRC) [131]; and an EH1 motif (green – consensus [FHY]x[IVM]xx[ILM][ILMV]) bound to the WD40 domain of the Transducin-like enhancer protein 1 (TLE) [132]. See the ELM resource for more details and examples [9]. b Example of specificity divergence after motif–binding domain duplication. A homologous pocket on the protein phosphatase 1 (PP1) and calcineurin holoenzymes bind RVxF and PxIxIT motifs respectively. The structure shows the canonical PP1 binding sequence RVxF motif (light blue) of myosin phosphatase targeting subunit (MYPT1) bound to PP1 (grey). The PxIxIT of African swine fever virus A238L (A238L) (orange) is superimposed showing the shared but diverged binding pocket [115]. The valine and phenylalanine of the RVxF motif sit in the hydrophobic P1 and P3 regions occupied by the proline and first isoleucine of the PxIxIT binding pocket (see Fig. 1d) but the additional specificity/affinity determinants of the two motifs utilise different surfaces of the domain and do not overlap [50, 133]

References

    1. Bejerano G, Haussler D, Blanchette M. Into the heart of darkness: large-scale clustering of human non-coding DNA. Bioinformatics. 2004;20(Suppl 1):i40–8. doi: 10.1093/bioinformatics/bth946. - DOI - PubMed
    1. Gerstein MB, Lu ZJ, Van Nostrand EL, Cheng C, Arshinoff BI, Liu T, et al. Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project. Science. 2010;330(6012):1775–87. doi: 10.1126/science.1196914. - DOI - PMC - PubMed
    1. Roy S, Ernst J, Kharchenko PV, Kheradpour P, Negre N, Eaton ML, et al. Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science. 2010;330(6012):1787–97. doi: 10.1126/science.1198374. - DOI - PMC - PubMed
    1. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57–74. - PMC - PubMed
    1. Tompa P. Intrinsically unstructured proteins. Trends Biochem Sci. 2002;27(10):527–33. doi: 10.1016/S0968-0004(02)02169-2. - DOI - PubMed

Publication types

LinkOut - more resources