Review

. 2020 Mar 23;21(2):458-472.

doi: 10.1093/bib/bbz007.

Disentangling the complexity of low complexity proteins

Pablo Mier¹, Lisanna Paladin², Stella Tamana³, Sophia Petrosian⁴, Borbála Hajdu-Soltész⁵, Annika Urbanek⁶, Aleksandra Gruca⁷, Dariusz Plewczynski^{8

9}, Marcin Grynberg¹⁰, Pau Bernadó⁶, Zoltán Gáspári¹¹, Christos A Ouzounis⁴, Vasilis J Promponas³, Andrey V Kajava^{12

13}, John M Hancock^{14

15}, Silvio C E Tosatto^{2

16}, Zsuzsanna Dosztanyi⁵, Miguel A Andrade-Navarro¹

Affiliations

¹ Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, Mainz, Germany.
² Department of Biomedical Science, University of Padova, Padova, Italy.
³ Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus.
⁴ Biological Computation and Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thessalonica, Greece.
⁵ MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary.
⁶ Centre de Biochimie Structurale, INSERM, CNRS, Université de Montpellier, Montpellier, France.
⁷ Institute of Informatics, Silesian University of Technology, Gliwice, Poland.
⁸ Center of New Technologies, University of Warsaw, Warsaw, Poland.
⁹ Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland.
¹⁰ Institute of Biochemistry and Biophysics, Warsaw, Poland.
¹¹ Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary.
¹² Centre de Recherche en Biologie Cellulaire de Montpellier, CNRS-UMR, Institut de Biologie Computationnelle, Universite de Montpellier, Montpellier, France.
¹³ Institute of Bioengineering, University ITMO, St. Petersburg, Russia.
¹⁴ Earlham Institute, Norwich, UK.
¹⁵ ELIXIR Hub, Welcome Genome Campus, Hinxton, UK.
¹⁶ CNR Institute of Neuroscience, Padova, Italy.

PMID: 30698641
PMCID: PMC7299295
DOI: 10.1093/bib/bbz007

Review

Disentangling the complexity of low complexity proteins

Pablo Mier et al. Brief Bioinform. 2020.

. 2020 Mar 23;21(2):458-472.

doi: 10.1093/bib/bbz007.

Authors

Affiliations

¹ Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, Mainz, Germany.
² Department of Biomedical Science, University of Padova, Padova, Italy.
³ Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus.
⁴ Biological Computation and Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thessalonica, Greece.
⁵ MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary.
⁶ Centre de Biochimie Structurale, INSERM, CNRS, Université de Montpellier, Montpellier, France.
⁷ Institute of Informatics, Silesian University of Technology, Gliwice, Poland.
⁸ Center of New Technologies, University of Warsaw, Warsaw, Poland.
⁹ Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland.
¹⁰ Institute of Biochemistry and Biophysics, Warsaw, Poland.
¹¹ Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary.
¹² Centre de Recherche en Biologie Cellulaire de Montpellier, CNRS-UMR, Institut de Biologie Computationnelle, Universite de Montpellier, Montpellier, France.
¹³ Institute of Bioengineering, University ITMO, St. Petersburg, Russia.
¹⁴ Earlham Institute, Norwich, UK.
¹⁵ ELIXIR Hub, Welcome Genome Campus, Hinxton, UK.
¹⁶ CNR Institute of Neuroscience, Padova, Italy.

PMID: 30698641
PMCID: PMC7299295
DOI: 10.1093/bib/bbz007

Abstract

There are multiple definitions for low complexity regions (LCRs) in protein sequences, with all of them broadly considering LCRs as regions with fewer amino acid types compared to an average composition. Following this view, LCRs can also be defined as regions showing composition bias. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, and more generally the overlaps between different properties related to LCRs, using examples. We argue that statistical measures alone cannot capture all structural aspects of LCRs and recommend the combined usage of a variety of predictive tools and measurements. While the methodologies available to study LCRs are already very advanced, we foresee that a more comprehensive annotation of sequences in the databases will enable the improvement of predictions and a better understanding of the evolution and the connection between structure and function of LCRs. This will require the use of standards for the generation and exchange of data describing all aspects of LCRs.

Short abstract: There are multiple definitions for low complexity regions (LCRs) in protein sequences. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, plus overlaps between different properties related to LCRs, using examples.

Keywords: composition bias; disorder; low complexity regions; structure.

PubMed Disclaimer

Figures

**Figure 1**
The LC diagram: sequence complexity composition versus periodicity. The diagram illustrates where several types of sequences would be placed in relation to two measures related to sequence complexity.

**Figure 2**
Shannon entropy value for each detected CBR against the CAST score normalized by the sequence length.

**Figure 3**
Motif graph based on SIMPLE analysis of CO1A1_HUMAN.

**Figure 4**
Comparison of positions detected to be of LC in the 21 proteins of our dataset. Methods SEG (in orange), CAST (in red), SIMPLE (in brown) and IUPred (in purple) were used. ANCHOR (in light blue), which includes structural aspects, is also compared.

**Figure 5**
LC diagram for various sequence datasets. The percentage of the top amino acid as a function of the percentage of mutations to perfect repeats calculated for a dataset of globular (GLOB), disordered (IUP) sequences as well as fragments of our protein dataset with LC character according to the SEG, CAST and SIMPLE methods.

**Figure 6**
Structural features of LC proteins. Venn diagram representing the FELLS prediction of dataset proteins, in four categories: secondary structure (SS), LCRs, disorder and aggregation. Each protein is assigned to a category if more than 30% of the residues in its sequence are predicted in that state.

See this image and copyright information in PMC

References

1. Dosztanyi Z. Prediction of protein disorder based on IUPred. Protein Sci 2018;27:331–340. - PMC - PubMed
1. Piovesan D, Tabaro F, Paladin L, et al. MobiDB 3.0: more annotations for intrinsic disorder, conformational diversity and interactions in proteins. Nucleic Acids Res 2018;46:D471–D476. - PMC - PubMed
1. Peng Z, Yan J, Fan X, et al. Exceptionally abundant exceptions: comprehensive characterization of intrinsic disorder in all domains of life. Cell Mol Life Sci 2015;72:137–151. - PMC - PubMed
1. Uversky VN, Oldfield CJ, Dunker AK. Intrinsically disordered proteins in human diseases: introducing the D2 concept. Annu Rev Biophys 2008;37:215–246. - PubMed
1. Wright PE, Dyson HJ. Intrinsically disordered proteins in cellular signaling and regulation. Nat Rev Mol Cell Biol 2015;16:18–29. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Disentangling the complexity of low complexity proteins

Affiliations

Disentangling the complexity of low complexity proteins

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources