Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 10;12(8):1098.
doi: 10.3390/biom12081098.

Low Complexity Induces Structure in Protein Regions Predicted as Intrinsically Disordered

Affiliations

Low Complexity Induces Structure in Protein Regions Predicted as Intrinsically Disordered

Mariane Gonçalves-Kulik et al. Biomolecules. .

Abstract

There is increasing evidence that many intrinsically disordered regions (IDRs) in proteins play key functional roles through interactions with other proteins or nucleic acids. These interactions often exhibit a context-dependent structural behavior. We hypothesize that low complexity regions (LCRs), often found within IDRs, could have a role in inducing local structure in IDRs. To test this, we predicted IDRs in the human proteome and analyzed their structures or those of homologous sequences in the Protein Data Bank (PDB). We then identified two types of simple LCRs within IDRs: regions with only one (polyX or homorepeats) or with only two types of amino acids (polyXY). We were able to assign structural information from the PDB more often to these LCRs than to the surrounding IDRs (polyX 61.8% > polyXY 50.5% > IDRs 39.7%). The most frequently observed polyX and polyXY within IDRs contained E (Glu) or G (Gly). Structural analyses of these sequences and of homologs indicate that polyEK regions induce helical conformations, while the other most frequent LCRs induce coil structures. Our work proposes bioinformatics methods to help in the study of the structural behavior of IDRs and provides a solid basis suggesting a structuring role of LCRs within them.

Keywords: homorepeats; intrinsically disordered regions; low complexity regions; protein structure.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Figure 1
Figure 1
Description of the process to create the target dataset. The first number in each box refers to items from the original set from where the second items were extracted, e.g., IDRs from sequences or polyXYs from IDRs, with the exception of the PDB box, which accounts for PDB records and chains. See the Materials and Methods for details. The names used in parentheses indicate: seq, sequences; poly, polyXs or polyXYs; and idr, IDRs. (A) Final set of IDRs overlapping PDB sequences that contain polyXs; (B) Final set of IDRs overlapping PDB sequences that contain polyXYs; (C) Number of IDRs that contain polyXs independently of overlaps with PDB sequences and; (D) Number of IDRs that contain polyXYs, also independently of overlaps with PDB sequences.
Figure 2
Figure 2
Secondary structure in PDB homologs for simple LCRs by type. For each of the six most frequent polyX/polyXY (see legend for type), in a region of 100 residues centered in the polyX/polyXY, fraction of residues in aligned PDB sequences adopting structure. (A) polyX (1–Helix, 2–Sheet and 3–Others). (B) polyXY (4–Helix, 5–Sheet and 6–Others). See the Methods and Materials for details. The numeric annotations indicate absolute count values at 2% and 98% percentiles to highlight the lower and higher values for each structure type. The blue vertical lines delimit the mean region where polyXs or polyXYs are located.
Figure 3
Figure 3
An α-helical polyK in PA2G4. Top: PDB:6SXO shows protein PA2G4 (red; UniProt: Q9UQ80) with an IDR containing a polyK with α-helical structure. This conformation could be affected by the folding-on-binding interaction with the 28S ribosomal RNA. Bottom: alignment and structural annotations. IDR and polyK are indicated in cyan and blue. Pipe signs at the end of the alignment indicate that the chain ended at this position.
Figure 4
Figure 4
A polyE in SNF2L2 aligns to a coil region in yeast RSC4. Structure of the yeast protein RSC4 (PDB:2R0S). Human protein SNF2L2 (UniProt: P51531) with a polyE (blue) inside a predicted IDR (cyan) aligns to RSC4, suggesting that the polyE adopts a coil structure. Pipe signs at the end of the alignment indicate that the chain ended at this position.
Figure 5
Figure 5
A polyS in CO7 is part of a β-sheet. Structure of protein C07 (UniProt: P10643; PDB:7NYD chain C). The polyS (which can be extended according to our definitions to a polyRS region) is part of a long strand of an antiparallel β-sheet.
Figure 6
Figure 6
Two partially structured IDRs in the 26S proteasome non-ATPase regulatory subunit 1. The human PSDM1 (UniProt: Q99460) aligns to the ortholog in rat (UniProt: O88761). The structure of the sequence from rat (PDB:6EPF chain N) includes one IDR (IDR1) with two polyEK and one IDR (IDR2) with a polyEP. The chains I, H and Z, with which IDR1 interacts, are highlighted. The inset in the upper-right corner shows the rotated superior angle of the structure, focusing on these interactions.
Figure 7
Figure 7
Structures of immunoglobulin light chains show β structure in predicted IDRs with polyGS. (A) Structure of a human monoclonal antibody (PDB:7CR5 chain L) with a sequence almost identical to protein LV469 Immunoglobulin lambda variable 4–69 (UniProt: A0A075B6H9), a V region of variable domain of immunoglobulin light chains. The polyGS composes one of the strands from one of the four β-sheets of this immunoglobin structure. (B) Structure of a human antibody (6Q0E chain L) identical to human protein LV319 Immunoglobulin lambda variable 3–19 (UniProt: P01714). Pipe signs at the end of the alignment indicate that the chain ended at this position.
Figure 8
Figure 8
Structural propensities for simple LCRs in IDRs. The structural propensities of LCRs and surrounding regions were computed using LS2P (see the Materials and Methods for details). The vertical line indicates the central position of the LCR. (A) The top six more common polyX. (B) The top six more common polyXY. Structure types shown are helical (red), extended (blue) and others (green).

Similar articles

Cited by

References

    1. Tompa P. Intrinsically Unstructured Proteins. Trends Biochem. Sci. 2002;27:527–533. doi: 10.1016/S0968-0004(02)02169-2. - DOI - PubMed
    1. Peng Z., Yan J., Fan X., Mizianty M.J., Xue B., Wang K., Hu G., Uversky V.N., Kurgan L. Exceptionally Abundant Exceptions: Comprehensive Characterization of Intrinsic Disorder in All Domains of Life. Cell. Mol. Life Sci. 2015;72:137–151. doi: 10.1007/s00018-014-1661-9. - DOI - PMC - PubMed
    1. Jorda J., Xue B., Uversky V.N., Kajava A.V. Protein Tandem Repeats—the More Perfect, the Less Structured. FEBS J. 2010;277:2673–2682. doi: 10.1111/j.1742-4658.2010.07684.x. - DOI - PMC - PubMed
    1. van der Lee R., Buljan M., Lang B., Weatheritt R.J., Daughdrill G.W., Dunker A.K., Fuxreiter M., Gough J., Gsponer J., Jones D.T., et al. Classification of intrinsically disordered regions and proteins. Chem. Rev. 2014;114:6589–6631. doi: 10.1021/cr400525m. - DOI - PMC - PubMed
    1. Oldfield C.J., Dunker A.K. Intrinsically Disordered Proteins and Intrinsically Disordered Protein Regions. Annu. Rev. Biochem. 2014;83:553–584. doi: 10.1146/annurev-biochem-072711-164947. - DOI - PubMed

Publication types

LinkOut - more resources