Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan;29(1):169-183.
doi: 10.1002/pro.3754. Epub 2019 Nov 11.

IDDomainSpotter: Compositional bias reveals domains in long disordered protein regions-Insights from transcription factors

Affiliations

IDDomainSpotter: Compositional bias reveals domains in long disordered protein regions-Insights from transcription factors

Peter S Millard et al. Protein Sci. 2020 Jan.

Abstract

Protein domains constitute regions of distinct structural properties and molecular functions that are retained when removed from the rest of the protein. However, due to the lack of tertiary structure, the identification of domains has been largely neglected for long (>50 residues) intrinsically disordered regions. Here we present a sequence-based approach to assess and visualize domain organization in long intrinsically disordered regions based on compositional sequence biases. An online tool to find putative intrinsically disordered domains (IDDomainSpotter) in any protein sequence or sequence alignment using any particular sequence trait is available at http://www.bio.ku.dk/sbinlab/IDDomainSpotter. Using this tool, we have identified a putative domain enriched in hydrophilic and disorder-promoting residues (Pro, Ser, and Thr) and depleted in positive charges (Arg and Lys) bordering the folded DNA-binding domains of several transcription factors (p53, GCR, NAC46, MYB28, and MYB29). This domain, from two different MYB transcription factors, was characterized biophysically to determine its properties. Our analyses show the domain to be extended, dynamic and highly disordered. It connects the DNA-binding domain to other disordered domains and is present and conserved in several transcription factors from different families and domains of life. This example illustrates the potential of IDDomainSpotter to predict, from sequence alone, putative domains of functional interest in otherwise uncharacterized disordered proteins.

Keywords: DNA-binding domain; IDDomainSpotter; IDPs; NMR; compositional bias; domain; low-complexity regions; p53; plant MYB protein; transactivation domain; transcription factor.

PubMed Disclaimer

Figures

Figure 1
Figure 1
IDDomainSpotter profiles of p53 (a, c, e) and GCR (b, d, f). Profiles of human p53 (a), profiles of aligned mammalian p53 sequences (c), and of aligned chordate p53 sequences (e) shown above each other for comparison. Profiles of human GCR (b), profiles of aligned mammalian GCR sequences (d), and of aligned chordate GCR sequences (f) shown above each other for comparison. Gray boxes indicate domains, conserved regions, and motifs as named above the graph. Light gray boxes indicate the same information transferred to the aligned sequence profiles by locating the corresponding positions in the alignment. Profiles display scores for Phe + Tyr + Gly (+FYG), Leu + Val + Ile (+LVI), Arg + Lys (+RK), Arg + Lys‐Asp‐Glu (+RK−DE), and Pro+Ser + Thr‐Arg‐Lys (+PST−RK) calculated over 15 residues windows. See Tables S1–S4 for sequences used in alignments. DBD, DNA‐binding domain; LBD, ligand‐binding domain; RD, regulatory domain; TAD, transactivation domain; TD, tetramerization domain
Figure 2
Figure 2
IDDomainSpotter profiles of NAC46 (a, c) and NAC13 (b, d). Profiles of A. thaliana NAC46 (a) and of aligned Brassicales NAC46 sequences (c) shown above each other for comparison. Profiles of A. thaliana NAC13 (b) and of aligned Brassicales NAC13 sequences (d) shown above each other for comparison. Gray boxes indicate domains, conserved regions and motifs as named above the graph. Light gray boxes indicate the same information transferred to the aligned sequence profiles by locating the corresponding positions in the alignment. Profiles display scores for Phe + Tyr + Gly (+FYG), Leu + Val + Ile (+LVI), Arg + Lys (+RK), Arg + Lys‐Asp‐Glu (+RK−DE), and Pro + Ser + Thr‐Arg‐Lys (+PST−RK) calculated over 15 residues windows. See Tables S5 and S6 for sequences used in alignments. DBD, DNA‐binding domain; RB, RCD1 binding region; TMD, transmembrane domain
Figure 3
Figure 3
IDDomainSpotter profiles of A. thaliana MYB28 (a), A. thaliana MYB29 (b), and of aligned Brassicales MYB28/MYB29‐like sequences (c) shown above each other for comparison. Gray boxes indicate domains, conserved regions, and motifs as named above the graph. Light gray boxes indicate the same information transferred to the aligned sequence profiles by locating the corresponding positions in the alignment. Profiles display scores for Phe + Tyr + Gly (+FYG), Leu + Val + Ile (+LVI), Arg + Lys (+RK), Arg + Lys‐Asp‐Glu (+RK−DE), and Pro + Ser + Thr‐Arg‐Lys (+PST−RK) calculated over 15 residues windows. See Table S7 for sequences used in alignments. DBD, DNA‐binding domain; MIM, MYC‐interaction motif; +PST−RK, region investigated in this study
Figure 4
Figure 4
Structure propensities of the MYB28 (M117‐R197) and MYB29 (G120‐R178) putative +PST‐RK domains. (a) Amino acid sequences of MYB28117–197 and MYB29120–178 following removal of the N‐terminal GST‐tag. CD spectra of the MYB28 (b) and MYB29 (c) +PST‐RK domains. 15N‐1H HSQC spectra of the MYB28 (d) and MYB29 (e) +PST‐RK domains. CD, circular dichroism
Figure 5
Figure 5
Secondary chemical shift analysis of the MYB28 putative +PST‐RK domain (a), the MYB29 putative +PST‐RK domain (b) and the human p53 N‐terminal IDR (c). The applied sequence‐corrected random coil Cα shifts were calculated with correction for temperature and pH.53, 54 p53 Cα shifts were obtained from BMRB Entry 17760.55 Red diamonds indicate unassigned residues. IDR, intrinsically disordered region
Figure 6
Figure 6
R 1, R 2, R 2/R 1, and heteronuclear NOE values from 15N‐relaxation NMR experiments of the MYB28 putative +PST‐RK domain. Bars represent fitted values of data from consecutive acquisitions ± error of the fits. Black line: R 2 relaxation rates fitted with random coil values.56, 57 Red diamonds indicate unassigned residues or prolines. NMR, nuclear magnetic resonance
Figure 7
Figure 7
SAXS analysis of MYB28117–197. (a) SAXS scaled log versus log plot at 2.0 and 4.0 mg/ml. The radius of gyration (R g) calculated from the scattering data of the 4.0 mg/ml sample is shown. (b) Kratky plot at 2.0 and 4.0 mg/ml. SAXS, small‐angle X‐ray scattering
Figure 8
Figure 8
Domain model of MYB TFs. R2R3 MYB DNA‐binding domain in complex with DNA (PDB ID: http://firstglance.jmol.org/fg.htm?mol=1MSE), with the protein colored gray and DNA orange. Domains within the IDR have different sequence properties (i.e., enriched or depleted in side chains increasing hydrophilicity, flexibility, [net] charge, etc.) and thus distinct structural properties of relevance to differentiated molecular functions. The +PST‐RK domain, bordering the DNA‐binding domain is indicated. IDR, intrinsically disordered region

References

    1. Wright PE, Dyson HJ. Intrinsically disordered proteins in cellular signalling and regulation. Nat Rev Mol Cell Biol. 2015;16:18–29. - PMC - PubMed
    1. Borcherds W, Theillet F‐X, Katzer A, et al. Disorder and residual helicity alter p53‐Mdm2 binding affinity and signaling in cells. Nat Chem Biol. 2014;10:1000–1002. - PubMed
    1. Arai M, Sugase K, Dyson HJ, Wright PE. Conformational propensities of intrinsically disordered proteins influence the mechanism of binding and folding. Proc Natl Acad Sci USA. 2015;112:9614–9619. - PMC - PubMed
    1. Clark S, Myers JB, King A, et al. Multivalency regulates activity in an intrinsically disordered transcription factor. Elife. 2018;7:e36258. - PMC - PubMed
    1. Kulkarni P, Jolly MK, Jia D, et al. Phosphorylation‐induced conformational dynamics in an intrinsically disordered protein and potential role in phenotypic heterogeneity. Proc Natl Acad Sci USA. 2017;114:E2644–E2653. - PMC - PubMed

Publication types

MeSH terms