Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2018 Nov;18(21-22):e1800054.
doi: 10.1002/pmic.201800054. Epub 2018 Oct 30.

Order in Disorder as Observed by the "Hydrophobic Cluster Analysis" of Protein Sequences

Affiliations
Review

Order in Disorder as Observed by the "Hydrophobic Cluster Analysis" of Protein Sequences

Tristan Bitard-Feildel et al. Proteomics. 2018 Nov.

Abstract

Hydrophobic cluster analysis (HCA) is an original approach for protein sequence analysis, which provides access to the foldable repertoire of the protein universe, including yet unannotated protein segments ("dark proteome"). Foldable segments correspond to ordered regions, as well as to intrinsically disordered regions (IDRs) undergoing disorder to order transitions. In this review, how HCA can be used to give insight into this last category of foldable segments is illustrated, with examples matching known 3D structures. After reviewing the HCA principles, examples of short foldable segments are given, which often contain short linear motifs, typically matching hydrophobic clusters. These segments become ordered upon contact with partners, with secondary structure preferences generally corresponding to those observed in the 3D structures within the complexes. Such small foldable segments are sometimes larger than the segments of known 3D structures, including flanking hydrophobic clusters that may be critical for interaction specificity or regulation, as well as intervening sequences allowing fuzziness. Cases of larger conditionally disordered domains are also presented, with lower density in hydrophobic clusters than well-folded globular domains or with exposed hydrophobic patches, which are stabilized by interaction with partners.

Keywords: HCA; dark proteome; disorder; foldability; secondary structure.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Principles of HCA. The amino acid sequence is written on a duplicated α‐helical net, in which the seven strong hydrophobic amino acids (V,I,L,M,F,Y,W) are contoured, forming HCs, which mainly correspond to regular secondary structures (RSSs). HCs are separated from each other by at least four non‐hydrophobic amino acids or a proline (amino acids depicted in red). The 2D net and neighborhood are detailed at left, together with the four symbols used for amino acids with particular structural behavior. At right are shown two examples of HC species (each species being defined by a unique binary pattern) with strong affinities for α‐helices (H) and β‐strands (E), respectively, and the corresponding binary codes, Quark (Q)‐codes and Peitsch (P)‐codes. Quarks correspond to the four basic units (v (vertical, 11), m (mosaic, 101), u (up 1001), and d (down, 10001)), from which any HC can be built. The three axes corresponding to these quarks are shown at left on the 2D net. P‐codes correspond to the sums of powers of 2, indexed according to the position of each number of the binary code (the last position of the HC corresponding to 0).
Figure 2
Figure 2
Amino acid coverage of the UniProt/SwissProt database by SEG‐HCA foldable regions. These predictions are compared to (A) consensus disorder predictions, as made by MobiDB‐lite28 and (B) domain database annotations (Pfam v31.0).37
Figure 3
Figure 3
Delineation of order and disorder in the human enable/vasodilator‐stimulated phosphoprotein (Ena/VASP – UniProt P50552). The foldable regions, as predicted using SEG‐HCA, are boxed (black) on the HCA plot. Additional information is reported about the corresponding experimental data (observed 3D structures and corresponding RSSs) (grey boxes, with PDB identifiers indicated) and order/disorder predictions (upper part). Colored bars: predictions of disorder reported by MobiDB‐lite (consensus),20 as well as by IUPRED48 and by ANCHOR (disorder‐to‐order transitions).49, 57 Peitsch (P‐)codes and HC affinities for RSS are indicated (E/e, strand, and H/h, helix, with upper/lower cases corresponding to strong and weak affinities, respectively), except for the four basic units (called “quarks”, see Figure 1), displaying per se no clear secondary structure affinities. No statistics (nd, not determined) are available for too long clusters, which can however sometimes be split into more informative, shorter clusters (dotted red bar). RSS propensities focused on the HC limits (mean of the individual propensities of each amino acids for the different RSS) generally provide relevant predictions about the expected structural behavior (highest propensities are shown in green).
Figure 4
Figure 4
Short foldable segments on the HCA plots. The positions of foldable segments delineated using SEG‐HCA are boxed, whereas those of the corresponding interacting peptide 3D structures found within small foldable segments are shaded in red. These interacting peptides are depicted in red on the ribbon representation of the 3D structure complexes, with the hydrophobic amino acids depicted in atomic details. The interacting partner is depicted in grey. Observed RSS and predictions are indicated below of or up to the HCA plots, respectively. A and B) Long peptides. A) Intramolecular interaction. The N‐terminal region of the Est3 telomerase subunit, forming together with the C‐terminal region, a cap covering a five‐stranded β‐barrel (UniProt Q03096, PDB 2M9V62). B) Intermolecular interaction. The N‐terminal arm of the methylmalonyl coA mutase α−subunit, wrapping around the β‐subunit (UniProt P11653, PDB 3REQ98). C–F) Short linear motifs. C) The Replication Protein A (RPA)‐binding domain of Saccharomyces cerevisiae Ddc2 (UniProt Q6CUV9, ATRIP in human) in complex with the N‐terminal OB fold of the RPA's largest subunit (S. cerevisiae Rfa1, RPA70 in human) (PDB 5OMC).99 The N‐terminal region of Ddc2 serves as a RPA‐binding domain allowing the recruitment of the Mec1‐Dcd2 complex (ATR‐ATRIP in human), a key DNA‐damage‐sensing kinase, to DNA damage sites.99 The additional HC, upstream the interacting HC, may bind to the hydrophobic extension of the binding groove, depicted at right on the solvent accessible surface (yellow star). D) The LXXLL motif (NR box) of the rat nuclear receptor coactivator (NCoA‐5, UniProt Q9HCD5) in complex with estrogen receptor beta ERβ (PDB 2J7X). The α‐helicoidal LXXLL motif fits into a groove of the ERβ ligand‐activated hormone binding domain (AF‐2 pocket). Flanking sequences of LXXLL NR boxes have been shown to be involved in the modulation of the affinity and/or selectivity of interaction.100, 101 It is also possible here that the HC downstream the NR box plays a role in the selectivity of the interaction or its regulation. This is supported by the fact that another druggable BF‐3 pocket, conserved among nuclear receptors, has also been identified in the proximity of the AF‐2 pocket,102 which has been shown to be targeted by NR‐binding motifs.103 E) The N‐terminal IAP‐binding motif of the Drosophila melanogaster cell death protein Grim (UniProt Q24570) in complex with the first BIR (baculoviral IAP repeat) domain of Diap1, a member of the inhibitor of apoptosis family (PDB 1SE0).104 The pro‐death protein Reaper, Hif, and Grim (RHG) induce apoptosis by antagonizing DIAP1 function, by relieving the DIAP1‐mediated inhibition of the effector caspase DrICE. F) A peptide from the nuclear pore Nup159 (UniProt P40477), in complex with the core β‐sandwich of the nucleoporin Dyn2, forming a homodimer (PDB 4DS1).105
Figure 5
Figure 5
TRFH‐binding motif (TBM). The TBM of human SLX4 (UniProt Q8IY92) in complex with TRF2 (PDB 4M7C106), compared to the TBM of Apollo (UniProt Q9H816) and of TIN2 (UniProt Q9BSI4) in complex with TRF2 and TRF1, respectively (107, PDB 3BUA and 3BU8). The telomere restriction fragment homology (TRFH) domains of shelterin proteins TRF1 and TRF2 are the principal mediators that recruit several non‐shelterin accessory proteins to telomeres. Of these are the SLX4 and Apollo nucleases, which share a short peptide with a common signature sequence YxLxP (red and orange), folding as an α‐helix (sequence identities/similarities are shaded). The TRFH TIN2‐interaction site is adjacent (blue), but distinct from the SLX4‐Apollo binding site, with TIN2 binding in an extended conformation. Of note is that the first part of the TIN2 peptide perfectly superimposes with the end of the SLX4‐Apollo peptides (see the corresponding sequence identities/similarities), suggesting that the segment C‐terminal of the interacting peptide of SLX4 and/or Apollo might bind in an extended conformation in this adjacent site. This hypothesis is further supported by the fact that HCs with strand affinities are found downstream of the interacting peptide in the SLX4 and Apollo foldable segments delineated by SEG‐HCA (red and grey boxes, respectively). The Tin2 peptide (shaded blue) was not detected as a putative foldable segment.
Figure 6
Figure 6
Large, disordered foldable segments, with a low density in HCs. HCA plot of nucleoprotein of human SARS coronavirus (UniProt P59595) and crystal structure of the N‐terminal domain (NTD, PDB 2OFZ). SGRD, serine–glycine–arginine rich domain; SRD, serine rich domain.
Figure 7
Figure 7
Large, conditionally disordered foldable domains, with standard density in HCs. HCA plots of ET domains from the YEAST (top, human AF9) and BRDT (bottom, human BRD4) families, and their small interacting peptides in different protein partners (at right: human AF4 and human NSD3, as well as at bottom: a second peptide in human NSD3, human BRG1, MoMLV Pr180, and human JMJD6). Foldable regions, as predicted by SEG‐HCA, are boxed, and the limits of observed 3D structures is shaded in green (ET domain) and in orange/red (small interacting peptides). These sequences are placed within the context of the whole protein architectures, for which are also reported PROSITE domain annotations, as well as MobiDB‐Lite disorder annotations. Ribbon representations of the 3D structures are displayed, together with solvent accessible surface representations of the ET domain, illustrating the hydrophobic patch (blue) recognized by the interacting peptides. UniProt: Hs AF9: P42568, Hs BRD4: O60885, Hs NSD3: Q9BZ95, MoMLV (Moloney Murine Leukemia Virus) Pr180 (gag‐Pro‐Pol polyprotein): Q8UN00, Hs AF4: P51825, Hs BRG1: P51532, Hs JMJD6: Q6NYC1.

References

    1. Scaiewicz A., Levitt M., Curr. Opin. Genet. Dev. 2015, 35, 50. - PMC - PubMed
    1. Jin J., Xie X., Chen C., Park J. G., Stark C., James D. A., Olhovsky M., Linding R., Mao Y., Pawson T., Sci. Signal 2009, 2, ra76. - PubMed
    1. Bornberg‐Bauer E., Albà M. M., Curr. Opin. Struct. Biol. 2013, 23, 459. - PubMed
    1. Zhang X. C., Wang Z., Zhang X., Le M. H., Sun J., Xu D., Cheng J., Stacey G., BMC Evol. Biol. 2012, 12, 6. - PMC - PubMed
    1. Forslund K., Henricson A., Hollich V., Sonnhammer E. L., Mol. Biol. Evol. 2008, 25, 254. - PubMed

Publication types

MeSH terms

LinkOut - more resources