Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Mar;25(3):187-211.
doi: 10.1038/s41580-023-00673-0. Epub 2023 Nov 13.

The molecular basis for cellular function of intrinsically disordered protein regions

Affiliations
Review

The molecular basis for cellular function of intrinsically disordered protein regions

Alex S Holehouse et al. Nat Rev Mol Cell Biol. 2024 Mar.

Abstract

Intrinsically disordered protein regions exist in a collection of dynamic interconverting conformations that lack a stable 3D structure. These regions are structurally heterogeneous, ubiquitous and found across all kingdoms of life. Despite the absence of a defined 3D structure, disordered regions are essential for cellular processes ranging from transcriptional control and cell signalling to subcellular organization. Through their conformational malleability and adaptability, disordered regions extend the repertoire of macromolecular interactions and are readily tunable by their structural and chemical context, making them ideal responders to regulatory cues. Recent work has led to major advances in understanding the link between protein sequence and conformational behaviour in disordered regions, yet the link between sequence and molecular function is less well defined. Here we consider the biochemical and biophysical foundations that underlie how and why disordered regions can engage in productive cellular functions, provide examples of emerging concepts and discuss how protein disorder contributes to intracellular information processing and regulation of cellular function.

PubMed Disclaimer

Conflict of interest statement

Competing Interest

A.S.H. is a scientific consultant with Dewpoint Therapeutics and on the Scientific Advisory Board for Prose Foods. All other authors declare no conflicts of interest.

Figures

Figure 1:
Figure 1:. IDRs are central to cellular function.
IDRs play critical cellular roles across cellular compartments. From top left clockwise. (a) The nuclear pore complex is a macromolecular portal that controls the partitioning of biomolecules between the nucleus and cytosol and regulate passage through the nuclear envelope. The central lumen of its pore is filled with a chemically-tuned meshwork of IDRs — phenylalanine-glycine (FG) repeats — from nucleoporin (Nup) proteins that enable selectivity through favourable transient interactions with nuclear transport receptors. (b) Histones are among the most abundant proteins in Eukaryotes and act as positively charged counterions to compact negative DNA into chromatin. Histone tails are IDRs that undergo extensive post-translational modification (PTM), enabling both changes to the intrinsic biophysical behaviour and the recruitment or exclusion of partner proteins to determine epigenetic state. (c) G-protein coupled receptors (GPCRs) are a large class of membrane-bound receptors that transduce extracellular stimuli into chemical information. Many GPCRs contain IDRs in their intracellular and extracellular loops and tails. These IDRs are highly variable in composition and length, suggesting they may act as evolutionary-labile sensors connected to a more conserved signal-transduction machine. (d) For many organisms, resilience to low levels of water is among the strongest selective pressures. Most identified desiccation-resistance proteins (e.g., hydrophilins, CAHS proteins etc.) are disordered when in aqueous environments, although many also acquire helicity upon desiccation. The molecular details that underlie how and why disordered proteins appear to play key roles in desiccation tolerance remains enigmatic. (e) Stress granules are an evolutionarily conserved class of cytoplasmic condensate that form in response to cellular stress. In humans, stress granule formation often depends on the largely disordered paralogous proteins G3BP1/2. More broadly, however, many core stress granule proteins contain large IDRs, potentially related to their roles in RNA binding and environmental responsiveness. (f) IDRs are often found in multidomain proteins that facilitate the formation of large dynamic macromolecular complexes. In these, they may act as flexible linkers connecting folded domains, or as molecular recognition modules that facilitate complex formation. (g) IDRs can exert entropic force, here shown in membrane proteins. Any reduction in available volume of an IDR – for example, by the presence of an adj acent membrane – results in a corresponding force proportional to the entropic cost levied by the lost volume (highlighted by arrows). (h) IDRs are often found in RNA binding proteins. They can bind RNA directly and can enhance or suppress the binding affinity of canonical RNA binding domains. Given the size mismatch between mRNA and most proteins, productive RNA recognition events may require the collective behaviour of many proteins, and IDRs may contribute to both protein–protein and protein–RNA interactions. (i) Transmembrane signalling proteins (e.g., T-cell receptors, cytokine receptors, and growth factor receptors) often contain intracellular disordered regions that contribute to signal amplification upon receptor clustering. These regions can interact with other IDRs, act as a platform upon which downstream signalling molecules can co-assemble or undergo PTMs (especially phosphorylation) to indicate signalling status. (j) Genome maintenance represents an essential set of cellular programmes conserved from yeast to humans. Many of the core proteins that drive central steps in different aspects of genome maintenance contain large IDRs with important cellular functions (e.g., p53, BRCA1, BRCA2, ATM, MLH, XPA). These IDRs may aid in the coordination of DNA repair by recruiting other proteins but may also interact directly with DNA. (k) Transcription factors are DNA-binding proteins that dictate the set of genes being expressed at any given moment. Most transcription factors contain IDRs. In addition to mediating the recruitment of appropriate partner proteins – which also typically contain IDRs – to activate or repress gene expression (often via folding-upon binding), emerging work suggests transcription factor IDRs can even guide the specific of transcription factors for DNA sequences. (l) Biomolecular condensates are membrane-less non-stochiometric assemblies that concentrate specific biomolecules and exclude others. IDRs, owing to their multivalency, can participate in phase transitions associated with biomolecular condensate formation. In particular, the nucleolar substructure observed in vitro and in vivo is coordinated at least in part by sequence features in IDRs. These observations illustrate how mesoscopic organization can emerge despite disorder at the level of individual molecules.
Figure 2:
Figure 2:. IDRs exist in ensembles dictated by protein sequence features.
(a). IDRs exist in ensembles — a collection of dynamic conformations that are energetically accessible to a disordered region. Although folded domains also exist in ensembles, the conformations associated with a folded domain are typically structurally similar. By contrast, for IDRs, ensemble conformations are highly heterogeneous. Here we compare structural models for IDR ensembles in different molecular contexts (bottom) with schematized representations of IDR ensembles (top). Only a small number of separate conformations are shown for visual accessibility, however in reality, IDRs exchange between tens of thousands of different conformations. The four proteins depicted here are examples of IDRs from either a fully disordered protein (furthest left) or IDRs in different structural contexts. In each representation, one specific conformation is highlighted, and a collection of additional conformations are superimposed in shaded lines, with the goal of illustrating the structural heterogeneity associated with an ensemble. For a clearer demonstration of an ensemble, see Movie M1, a rendering from an all-atom simulation of the low complexity domain from the RNA binding protein hnRNPA1 (see Box 1). PDB codes for structures: left (homology model based on PDB:4CT5), centre (PDB:6GYR), right (PBD: 6YI3); note disordered regions are not visible in deposited PDB structures. (b) Because IDRs exist in ensembles, they cannot be represented by a single 3D structure. Consequently, IDR ensembles are described in terms of ensemble properties: specific metrics that can be measured, calculated, or predicted for the collection of conformations to quantify the ensemble. Commonly used ensemble properties include the radius of gyration and the end-to-end distance (measures of global ensemble dimensions), asphericity (a measure of ensemble shape), transient secondary structure (a measure of local structural acquisition) and inter-residue distances (a measure of specific ensemble dimensions). These properties can be calculated from simulations or measured experimentally (see Box 2). (c) IDR ensemble properties should ideally be described in terms of probability distributions. For example, the distribution of the radius of gyration is shown for two IDRs. One IDR (red) is compact, while the other IDR (black) is more expanded. (d) IDR ensembles often depend on residue patterning, which quantifies how segregated/clustered residues of one chemical group (here depicted as white or grey beads) are with respect to another. (e) Local sequence properties can influence IDR ensembles, such as charge patterning (left) and evenly spaced aromatic residues (right). (f) Overall, IDR ensemble properties are a consequence of the sequence-encoded physical chemistry and the context-dependence of interactions endowed by that physical chemistry. (g) Ensemble properties of IDR linkers tune the effective concentration of folded domains to one another. Two folded domains connected by a short IDR are inherently close to one another, yet if long IDRs are relatively compact, folded domains will remain close, despite the superficially “large” intervening disordered linker (see panel 2c). For two domains that interact with one another, linker properties (modulated via post-translational modifications or changes in linker sequence over evolution) can therefore tune inter-domain communication, thereby influencing local inhibition or activation or altering binding affinity for target molecules.
Figure 3:
Figure 3:. IDR ensemble properties are context dependent.
Behaviour of the IDR ensemble is highly context dependent. (a) Highly charged IDRs can be sensitive to changes in salt, although how salt influences ensemble properties depend on the IDR sequence features and the salt. If IDRs possess clusters of oppositely charged residues, these clusters can interact with one another driving chain compaction, an effect that is reduced as salt concentration is increased (top). By contrast, if charged residues are uniformly patterned, an increase in salt concentration may have a comparatively modest impact on IDR dimensions as no strong intramolecular interactions are found (bottom). Finally, divalent ions can bind to clusters of negatively charged residues with effects on local and global compaction (not shown). (b) Changes in pH can influence IDRs with amino acids that may be protonated (Asp, Glu, His) or deprotonated (Lys, Tyr, Arg, His) within physiological regimes. As a note, arginine deprotonation would seem to be almost impossible under physiological conditions. For uncharged IDRs with many histidine residues, a reduction in pH can lead to histidine protonation, driving intramolecular repulsion and leading to chain expansion (top). Conversely, if an IDR contains histidine and aromatic residues, protonation can lead to strong cations interactions between positively charged histidine and aromatic residues, driving chain compaction (bottom). (c) IDR dimensions respond to crowders differently; if crowders have weakly favourable non-specific interactions with IDRs then small crowders can drive IDR expansion while large crowders drive compaction. As a result, some IDRs may be well-poised to act as sensors of cellular crowding on specific length scales. (d) IDRs are sensitive to changes in temperature. For IDRs enriched in aliphatic hydrophobic residues (i.e., valine, leucine, isoleucine, methionine, alanine), the enhanced strength of the hydrophobic effect at higher temperatures leads to chain compaction (top). For IDRs enriched in aromatic residues, π:π interactions are enthalpically dominated, such that as temperature increases π:π interactions become weaker, and these chains become more expanded (bottom), and for IDRs in general, there is a loss of polyproline-II structures - an extended left-handed secondary structure that usually but not necessarily involves prolines - with temperature, leading to compaction. (e) Phosphorylation can have opposing effects on IDR dimensions. Phosphorylation of an uncharged region can lead to chain expansion driven by electrostatic repulsion between phosphate groups (top). However, phosphorylation of IDRs with clusters of positively charged residues can lead to chain compaction, driven by electrostatic interactions between phosphorylated residues and residues with positively charged clusters (bottom). Both effects can occur within a single IDR. Phosphorylation also impacts local structure and can stabilize and destabilize transient helices in a position dependent manner (not shown) (f) Arginine methylation weakens cation:p interactions between arginine and aromatic groups, which could lead to an increase in IDR dimensions (top). However, methylation does not neutralize arginine, such that intramolecular interactions driven by arginine-acidic residue interactions would likely be largely unaffected. (g) As solution context can influence IDR properties, folded domains adjacent to IDRs can do so too. The impact that folded domain surface features have on IDR ensemble properties depends on the chemistry of the folded domain and the IDR sequence. From left to right: Same charged residues on a folded domain surface and an IDR will repel one another, preventing intramolecular interaction and ensuring an IDR is projected into solution, away from the folded domain. Oppositely charged residues on a folded domain surface and an IDR will attract one another, driving intramolecular interaction. Hydrophobic interactions between aliphatic and/or aromatic residues on folded domain surfaces and IDRs can lead to intradomain interaction. If many IDRs are projected from a filament formed from folded domains, inter-IDR interaction and repulsion can lead to a bottle-brush architecture and a resulting entropic force. (h) Figure summarizing a current model for IDR function. IDRs are encoded by their amino acid sequence (left). That sequence determines the presence of SLiMs (middle top), the overall ensemble (middle center) and the presence of sequence features (middle bottom). All three properties and/or their functionality are influenced by IDR context. Ultimately, these context-dependent properties dictate both molecular function and the evolutionary constraints that govern IDR sequence variation over generations.
Figure 4:
Figure 4:. IDRs enable a range of molecular recognition modes.
(a) IDRs can bind partners via coupled folding and binding, where an IDR (or a subregion) folds upon interaction with its partner, be it DNA, RNA, protein, or a membrane. (b) IDRs can bind partners via fuzzy interactions, whereby multiple structurally distinct bound states are relevant to function. Illustrated here is a scenario where an IDR consistently interacts with the same interface in structurally distinct bound states. However, fuzzy interactions could also involve a scenario whereby an IDR possesses several non-overlapping motifs or binding residues that exchange in binding a single interface on the surface of a folded domain. (c) IDRs can bind disordered partners to form fully disordered complexes where no persistent structure or contacts are seen in either partner in the bound state. (d) IDR molecular recognition is often facilitated by Short Linear Motifs (SLiMs). These are often well-described as a consensus motif with evolutionary conserved and invariant positions, while other positions are partially or fully redundant. As a result, SLiMs can be described in terms of “regular expressions” (RegExs), a term borrowed from computer science that describes patterning matching when a subset of positions in a sequence are under some set of constraints (e.g., the PIP box binding to PCNA (QxxLxxFF), where X is any amino acid). (e) The sequence context around SLiMs is a critical determinant of binding. The same SLiM present in different proteins may bind with high affinity or not all, depending on the complementary chemical interactions between the residues flanking a SLiM and the surface surrounding the binding site. Thus, when the features of the flanking regions match those of the binding partner, the context is favourable (top), when no determining features are present, only the SLiM is deterministic for binding (middle) and when the features of the flanking regions and those of the binding partner surface do not match, the context is repressive (bottom). (f) Binding of IDRs often involves avidity and allovalency. Avidity emerges when multiple binding sites (e.g. SLiMs) enable two molecules to interact through two or more independent binding interfaces (top). Allovalency reflects the situation in which a single binding site on one partner is complemented by multiple identical binding interfaces on another (bottom). (g) IDRs can encode binding specificity in a variety of ways. Multiple SLiMs within a single IDR offer one route to high-specificity (and high affinity) binding, whereby only a limited set of partners possess binding interfaces common to all the SLiMs present, providing specificity combinatorily via many weak motifs (left). While conceptually this may be straightforward to understand, a growing body of work suggests the existence of a continuum of multivalent binding modes, whereby a combination of SLiMs and sequence features enable a trade off between sequence conservation and binding to a specific target (middle). Finally, IDRs may interact solely via chemical specificity, whereby specific sequence features lead to favourable interactions between the IDR and a partner, such as a positively-charged IDR binding to a negatively charged partner (right). The discriminatory power available for such a simple sequence feature may be limited, and other properties such as number of charges or charge density or properties yet to be discovered may enable specific molecular recognition
Figure 5:
Figure 5:. IDRs can undergo phase separation and contribute to biomolecular condensate formation.
(a) Biomolecular condensates are non-stoichiometric assemblies that concentrate specific biomolecules while excluding others. In cells, many condensates can co-exist, as shown here where nucleoli, nuclear speckles, and synthetic condensates generated using the PopTag oligomerization domain coexist in the same U2OS cell nucleus. (b) Condensates formed in vitro and in vivo through phase separation are often stabilized by IDRs, with a variety of distinct chemical interactions tuning condensate formation, maintenance, and material state. (c) IDRs that drive phase transitions can be described in terms of stickers and spacers, where stickers reflect regions or residues that have an outsized role in driving attractive interactions, while spacers are regions that connect stickers. (d) For IDRs that drive homotypic phase separation where many copies of the same IDR interact, favourable multivalent intra-molecular drive chain compaction, whereas favourable multivalent inter-molecular interactions drive phase separation. (e) If intra-condensate IDR concentrations are high, the high concentration of sidechain chemistries presented by the many IDR molecules effectively provides a novel solvent environment that can destabilize e.g., nucleic acid duplexes, but could also in principle catalyze chemical reactions. (f) The presence of IDRs adjacent to folded domains can prevent the formation of arrested condensates (irreversibly formed) through IDRs acting as local molecular ‘lubricants’. If the IDR engages in many weak interactions with the surface of the folded domain, those interactions can impede strong intermolecular interactions between folded domains that would otherwise lead to arrested condensates. In this way IDRs can act to ensure the condensates are dynamic and, upon a reduction in overall protein concentration, undergo disassembly. Here, folded domains are represented with discrete binding sites that mediate interactions with other folded domains. If folded domains lack IDRs, they readily assemble via interactions between folded domains, but those condensates become trapped irreversibly on the timescale of the schematic. In contrast, if folded domains possess IDRs, the IDRs lubricate folded domain interactions, leading to dynamic and reversible condensate formation. Image in part a is a courtesy of Steven Boeynaems, Baylor College of Medicine, Houston, TX.

References

Related links

    1. Metapredict disorder predictor: https://metapredict.net/
    1. CAID prediction portal: https://caid.idpcentral.org/submit
    1. Eukaryotic Linear Motif (ELM) resource: http://elm.eu.org/
    1. PLAAC Webserver for identify prion-like domains: http://plaac.wi.mit.edu/
    1. CIDER Webserver for calculating sequence properties: http://pappulab.wustl.edu/CIDER/

References

    1. Dunker AK et al. Intrinsically disordered protein. J. Mol. Graph. Model 19, 26–59 (2001).

      This article (along with Wright & Dyson 1999, Uversky 2001, and Tompa 2002, refs ,, and , respectively) makes the original arguments that IDRs can and do have important roles in cellular function.

    1. Wright PE & Dyson HJ Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. J. Mol. Biol 293, 321–331 (1999). - PubMed
    1. van der Lee R. et al. Classification of intrinsically disordered regions and proteins. Chem. Rev 114, 6589–6631 (2014). - PMC - PubMed
    1. Uversky VN Natively unfolded proteins: a point where biology waits for physics. Protein Sci. 11, 739–756 (2002). - PMC - PubMed
    1. Tompa P. Intrinsically unstructured proteins. Trends Biochem. Sci 27, 527–533 (2002). - PubMed

MeSH terms

Substances

LinkOut - more resources