. 2017 May 1;9(5):1248-1265.

doi: 10.1093/gbe/evx073.

Evidence for a Strong Correlation Between Transcription Factor Protein Disorder and Organismic Complexity

Inmaculada Yruela^{1

2}, Christopher J Oldfield³, Karl J Niklas⁴, A Keith Dunker³

Affiliations

¹ Estación Experimental de Aula Dei, Consejo Superior de Investigaciones Científicas (EEAD-CSIC), Zaragoza, Spain.
² Grupo de Bioquímica, Biofísica y Biología Computacional (BIFI, UNIZAR), Unidad Asociada al CSIC, Zaragoza, Spain.
³ Department of Biochemistry and Molecular Biology, Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN.
⁴ School of Integrative Plant Science, Cornell University, Ithaca, NY.

PMID: 28430951
PMCID: PMC5434936
DOI: 10.1093/gbe/evx073

Evidence for a Strong Correlation Between Transcription Factor Protein Disorder and Organismic Complexity

Inmaculada Yruela et al. Genome Biol Evol. 2017.

. 2017 May 1;9(5):1248-1265.

doi: 10.1093/gbe/evx073.

Authors

Inmaculada Yruela^{1

2}, Christopher J Oldfield³, Karl J Niklas⁴, A Keith Dunker³

Affiliations

¹ Estación Experimental de Aula Dei, Consejo Superior de Investigaciones Científicas (EEAD-CSIC), Zaragoza, Spain.
² Grupo de Bioquímica, Biofísica y Biología Computacional (BIFI, UNIZAR), Unidad Asociada al CSIC, Zaragoza, Spain.
³ Department of Biochemistry and Molecular Biology, Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN.
⁴ School of Integrative Plant Science, Cornell University, Ithaca, NY.

PMID: 28430951
PMCID: PMC5434936
DOI: 10.1093/gbe/evx073

Abstract

Studies of diverse phylogenetic lineages reveal that protein disorder increases in concert with organismic complexity but that differences nevertheless exist among lineages. To gain insight into this phenomenology, we analyzed all of the transcription factor (TF) families for which sequences are known for 17 species spanning bacteria, yeast, algae, land plants, and animals and for which the number of different cell types has been reported in the primary literature. Although the fraction of disordered residues in TF sequences is often moderately or poorly correlated with organismic complexity as gauged by cell-type number (r2 < 0.5), an unbiased and phylogenetically broad analysis shows that organismic complexity is positively and strongly correlated with the total number of TFs, the number of their spliced variants and their total disordered residues content (r2 > 0.8). Furthermore, the correlation between the fraction of disordered residues and cell-type number becomes stronger when confined to the TF families participating in cell cycle, cell size, cell division, cell differentiation, or cell proliferation, and other important developmental processes. The data also indicate that evolutionarily simpler organisms allow for the detection of subtle differences in the conserved IDRs of TFs as well as changes in variable IDRs, which can influence the DNA recognition and multifunctionality of TFs through direct or indirect mechanisms. Although strong correlations cannot be taken as evidence for cause-and-effect relationships, we interpret our data to indicate that increasing TF disorder likely was an important factor contributing to the evolution of organismic complexity and not merely a concurrent unrelated effect of increasing organismic complexity.

Keywords: cell-type number; complexity; evolution; intrinsically disordered protein (IDP); transcription factors.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1. — **Fig. 1.**
—Distribution of TF protein functions (A and B, left panel) and representative TF families associated with regulation of cell differentiation, cell proliferation, cell cycle, and cell size (A and B, right panel) in *A. thaliana* (A) and *H. sapiens* (B). The category “others” refers to poorly represented TF families, with only one protein or only in some taxa.

<sc>Fig</sc>. 2. — **Fig. 2.**
—Log₁₀-scatter plots of total disordered residues in TF proteins (x axis) versus the number of different cell types in plant (y axis) (A) and non-plant species (B). Disordered residue predictions were made by PONDR VLS2b.

<sc>Fig</sc>. 3. — **Fig. 3.**
—Scatter plot of fraction of disordered residues/total residues in TF proteins (x axis) versus the Log₁₀ of the number of different cell types (y axis) in plant and non-plant species. Disordered residue predictions were made by PONDR VLS2b.

<sc>Fig</sc>. 4. — **Fig. 4.**
—Scatter plot of fraction of disordered residues/total residues in TF families versus the Log₁₀ of the number of different cell types in MYB (n = 655 and 98) (A, B), bHLH (n = 862 and 518) (C, D), and bZIP (n = 570 and 249) (E, F) families from plant (A, C, E) and non-plant (B, D, F) species, respectively. Species codes are: Zm, *Z. mays*; Os, *O. sativa*; At, *A. thaliana*; Pp, *P. patens*; Sm, *S. moellendorffii*; Ce, *C. reinhardtii*; Chl, *Chlorella* sp. NC64A; Mp, *M. pusilla*; Ot, *O. tauri*; Hm, *H. sapiens*; Mm *M. musculus;* Xt, *X. tropicalis*; Dr, *D. rerio*; Dm, *D. melanogaster*; Ce, *C. elegans*; Sc, *S. cerevisiae*; Ec, *E. coli*. The number of proteins in each species is given between parentheses. Disordered residue predictions were made by PONDR VLS2b.

<sc>Fig</sc>. 5. — **Fig. 5.**
—Comparison of orthologues from the bHLH and MYB TF families. Alignments of orthologue proteins encode by (A) *NHLH1* gene from *H. sapiens* (Q02575), *M. musculus* (Q02576), *D. rerio* (Q6P0A8), *D. melanogaster* (O77278) and *C. elegans* (Q18590) and (B) MYB13 orthologues from *Z. mays* (K7TYD9), *O. sativa* (Q6K1S6), *Glycine max* (Q0PJK2), *A. thaliana* (Q9LNC9) and *S. moellendorffii* (D8RNQ0). The number of cell types is given between parentheses on the left side of the sequence. The protein length is written on the right side of the sequence. Sequences are represented by color coded bars representing predicted disorder: disordered residues (red), ordered residues (grey), and alignment gaps (white). (C) Maximum likelihood trees of *NHLH1* (left panel) and MYB13 (right panel) sequences used in the alignments. The specie name and the corresponding number of cell types are given in the tree.

<sc>Fig</sc>. 6. — **Fig. 6.**
—Comparison of orthologues from the bZIP TF family. (A) Alignment of HY5 orthologues in *Z. mays* (B6UEP1), *O. sativa* (Q0E2Y8), *G. max* (I1KXV2), *A. thaliana* (O24646), *S. moellendorffii* (D8RQ04), *C. reinhardtii* (A8IM85), and *O. lucimarinus* (A4RRH6). (B) Alignment of ZIP63 orthologues in *Z. mays* (B4FJ00), *S. bicolor* (C5WX70), *O. sativa* (Q7X9A8), *A. thaliana* (B9DGI8) and *P. patens* (A9TD07). The number of cell types is given between parentheses on the left side of the sequence. The protein length is given on the right side of the sequences. Sequences are represented by color coded bars representing predicted disorder: disordered residues (red), ordered residues (grey), and alignment gaps (white). Typical bZIP domain (black box), C3HC4 zinc-finger type domain (black star), and COP1 interacting domain (black triangle) are shown. (C) Maximum likelihood trees of HY5 (left panel) and ZIP63 (right panel) sequences used in the alignment. The specie name and the corresponding number of cell types are given in the tree.

<sc>Fig</sc>. 7. — **Fig. 7.**
—Comparison of orthologues from zinc finger TF families. (A) Alignment of FZF orthologues in *Z. mays* (B6TYD5), *O. sativa* (Q5Z8K9), *A. thaliana* (Q9ZQ18), *P. patens* (A9RG33), *S. moellendorffii* (D8RTH9), *V. carteri* (D8UB30) *and C. reinhardtii* (A8JD88). (B) Alignment of GATA9 orthologues in *Z. mays* (K7VQ40), *M. truncatula* (G7LFY2), *O. sativa* (Q6F2Z7) *A. thaliana* (O82632) and *O. lucimarinus* (A4RXG3). The number of cell types is given between parentheses on the left side of the sequence. The protein length is given on the right side of the sequences. Sequences are represented by color coded bars representing predicted disorder: disordered residues (red), ordered residues (grey), and alignment gaps (white). Typical C2H2 and zinc-finger GATA-type domains (black box) are shown. (C) Maximum likelihood trees of the ZFZ (left panel) and GATA9 (right panel) sequences used in the alignments. The specie name and the corresponding number of cell types are given in the tree.

<sc>Fig</sc>. 8. — **Fig. 8.**
—Comparison of orthologues from the E2F/DP TF family. (A) Scatter plot of fraction of disordered residues/total residues (x axis) versus the Log₁₀ of the number of different cell types (y axis) in plants: *Z. mays* (B4FB61), *O. sativa* (Q5QL93), *A. thaliana* (Q9FV71), *P. patens* (A9RQX0), *S. moellendorffii* (D8TCC4), *V. carteri* (Vocar.0001s0396.1), *M. pusilla* (C1MLR6) and *O. tauri* (A4RSR6) and animals: *M. musculus* (P56931), *X. tropicalis* (F6VW96), *D. rerio* (A5WUE8), *D. melanogaster* (O77051) and *C. elegans* (G5EF11). (B) Alignment of E2F1 orthologues. The number of cell types is given between parentheses on the left side of the sequence. The protein length is given on the right side of the sequences. Sequences are represented by color coded bars representing predicted disorder: disordered residues (red), ordered residues (grey), and alignment gaps (white). Typical DNA-binding and coiled coil domains (black box), cyclin A/CDK2 binding (black star) and retinoblastoma protein binding (black triangle) are shown. (C) The maximum likelihood tree of the E2F1 sequences used in the alignments. The specie name and the corresponding number of cell types are given in the tree.

<sc>Fig</sc>. 9. — **Fig. 9.**
—Bar-plot of the fraction of identical IDRs (dark red), similar IDRs (pink) and variable IDRs (blue) in chlorophyte, bryophyte and angiosperm plants, and invertebrate and vertebrate animals. The data represent the average of nine groups of orthologues in plants and three groups of orthologues in animals.

<sc>Fig</sc>. 10. — **Fig. 10.**
—Structure of human E2F1 and DP1 proteins in the monomer and heterodimer states. X-ray crystallographic of partial human E2F1-DP1 heterodimer (pdb 2AZE; Rubin et al. 2005) is shown as cartoon. The E2F1 (198–301 residues in pink) and DP1 (196–350 residues in blue) proteins are shown. Predicted disordered residues in monomers by PONDR VLS2b are shown in red line. The three-dimensional cartoons were drawn using PyMol 1.4.1 (Schrodinger LLC).

See this image and copyright information in PMC

Cited by

Features of molecular recognition of intrinsically disordered proteins via coupled folding and binding.
Yang J, Gao M, Xiong J, Su Z, Huang Y. Yang J, et al. Protein Sci. 2019 Nov;28(11):1952-1965. doi: 10.1002/pro.3718. Epub 2019 Sep 4. Protein Sci. 2019. PMID: 31441158 Free PMC article. Review.
Exploring intrinsically disordered proteins in Chlamydomonas reinhardtii.
Zhang Y, Launay H, Schramm A, Lebrun R, Gontero B. Zhang Y, et al. Sci Rep. 2018 May 1;8(1):6805. doi: 10.1038/s41598-018-24772-7. Sci Rep. 2018. PMID: 29717210 Free PMC article.
Computational prediction of disordered binding regions.
Basu S, Kihara D, Kurgan L. Basu S, et al. Comput Struct Biotechnol J. 2023 Feb 10;21:1487-1497. doi: 10.1016/j.csbj.2023.02.018. eCollection 2023. Comput Struct Biotechnol J. 2023. PMID: 36851914 Free PMC article. Review.
IndiSPENsable for X Chromosome Inactivation and Gene Silencing.
Kaufmann C, Wutz A. Kaufmann C, et al. Epigenomes. 2023 Nov 2;7(4):28. doi: 10.3390/epigenomes7040028. Epigenomes. 2023. PMID: 37987303 Free PMC article. Review.
Protein Aggregation Landscape in Neurodegenerative Diseases: Clinical Relevance and Future Applications.
Candelise N, Scaricamazza S, Salvatori I, Ferri A, Valle C, Manganelli V, Garofalo T, Sorice M, Misasi R. Candelise N, et al. Int J Mol Sci. 2021 Jun 2;22(11):6016. doi: 10.3390/ijms22116016. Int J Mol Sci. 2021. PMID: 34199513 Free PMC article. Review.

See all "Cited by" articles

References

1. Anderson-Sprecher R. 1994. Model comparisons and R. Am Stat. 48 (2):p113–117.
1. Babu MM. 2016. The contribution of intrinsically disordered regions to protein function, cellular complexity, and human disease. Biochem Soc Trans. 44:1185–1200. - PMC - PubMed
1. Bell G, Mooers AO.. 1997. Size and complexity among multicellular organisms. Biol J Linnean Soc. 60:345–363.
1. Bellay J, et al.2011. Bringing order to protein disorder through comparative genomics and genetic interactions. Genome Biol. 12:R14. - PMC - PubMed
1. Berardini TZ, et al.2004. Functional annotation of the Arabidopsis genome using controlled vocabularies. Plant Physiol. 135:1–11. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- Saccharomyces Genome Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Evidence for a Strong Correlation Between Transcription Factor Protein Disorder and Organismic Complexity

Affiliations

Evidence for a Strong Correlation Between Transcription Factor Protein Disorder and Organismic Complexity

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Miscellaneous