Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 1;9(5):1248-1265.
doi: 10.1093/gbe/evx073.

Evidence for a Strong Correlation Between Transcription Factor Protein Disorder and Organismic Complexity

Affiliations

Evidence for a Strong Correlation Between Transcription Factor Protein Disorder and Organismic Complexity

Inmaculada Yruela et al. Genome Biol Evol. .

Abstract

Studies of diverse phylogenetic lineages reveal that protein disorder increases in concert with organismic complexity but that differences nevertheless exist among lineages. To gain insight into this phenomenology, we analyzed all of the transcription factor (TF) families for which sequences are known for 17 species spanning bacteria, yeast, algae, land plants, and animals and for which the number of different cell types has been reported in the primary literature. Although the fraction of disordered residues in TF sequences is often moderately or poorly correlated with organismic complexity as gauged by cell-type number (r2 < 0.5), an unbiased and phylogenetically broad analysis shows that organismic complexity is positively and strongly correlated with the total number of TFs, the number of their spliced variants and their total disordered residues content (r2 > 0.8). Furthermore, the correlation between the fraction of disordered residues and cell-type number becomes stronger when confined to the TF families participating in cell cycle, cell size, cell division, cell differentiation, or cell proliferation, and other important developmental processes. The data also indicate that evolutionarily simpler organisms allow for the detection of subtle differences in the conserved IDRs of TFs as well as changes in variable IDRs, which can influence the DNA recognition and multifunctionality of TFs through direct or indirect mechanisms. Although strong correlations cannot be taken as evidence for cause-and-effect relationships, we interpret our data to indicate that increasing TF disorder likely was an important factor contributing to the evolution of organismic complexity and not merely a concurrent unrelated effect of increasing organismic complexity.

Keywords: cell-type number; complexity; evolution; intrinsically disordered protein (IDP); transcription factors.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1.
Fig. 1.
—Distribution of TF protein functions (A and B, left panel) and representative TF families associated with regulation of cell differentiation, cell proliferation, cell cycle, and cell size (A and B, right panel) in A. thaliana (A) and H. sapiens (B). The category “others” refers to poorly represented TF families, with only one protein or only in some taxa.
<sc>Fig</sc>. 2.
Fig. 2.
—Log10-scatter plots of total disordered residues in TF proteins (x axis) versus the number of different cell types in plant (y axis) (A) and non-plant species (B). Disordered residue predictions were made by PONDR VLS2b.
<sc>Fig</sc>. 3.
Fig. 3.
—Scatter plot of fraction of disordered residues/total residues in TF proteins (x axis) versus the Log10 of the number of different cell types (y axis) in plant and non-plant species. Disordered residue predictions were made by PONDR VLS2b.
<sc>Fig</sc>. 4.
Fig. 4.
—Scatter plot of fraction of disordered residues/total residues in TF families versus the Log10 of the number of different cell types in MYB (n = 655 and 98) (A, B), bHLH (n = 862 and 518) (C, D), and bZIP (n = 570 and 249) (E, F) families from plant (A, C, E) and non-plant (B, D, F) species, respectively. Species codes are: Zm, Z. mays; Os, O. sativa; At, A. thaliana; Pp, P. patens; Sm, S. moellendorffii; Ce, C. reinhardtii; Chl, Chlorella sp. NC64A; Mp, M. pusilla; Ot, O. tauri; Hm, H. sapiens; Mm M. musculus; Xt, X. tropicalis; Dr, D. rerio; Dm, D. melanogaster; Ce, C. elegans; Sc, S. cerevisiae; Ec, E. coli. The number of proteins in each species is given between parentheses. Disordered residue predictions were made by PONDR VLS2b.
<sc>Fig</sc>. 5.
Fig. 5.
—Comparison of orthologues from the bHLH and MYB TF families. Alignments of orthologue proteins encode by (A) NHLH1 gene from H. sapiens (Q02575), M. musculus (Q02576), D. rerio (Q6P0A8), D. melanogaster (O77278) and C. elegans (Q18590) and (B) MYB13 orthologues from Z. mays (K7TYD9), O. sativa (Q6K1S6), Glycine max (Q0PJK2), A. thaliana (Q9LNC9) and S. moellendorffii (D8RNQ0). The number of cell types is given between parentheses on the left side of the sequence. The protein length is written on the right side of the sequence. Sequences are represented by color coded bars representing predicted disorder: disordered residues (red), ordered residues (grey), and alignment gaps (white). (C) Maximum likelihood trees of NHLH1 (left panel) and MYB13 (right panel) sequences used in the alignments. The specie name and the corresponding number of cell types are given in the tree.
<sc>Fig</sc>. 6.
Fig. 6.
—Comparison of orthologues from the bZIP TF family. (A) Alignment of HY5 orthologues in Z. mays (B6UEP1), O. sativa (Q0E2Y8), G. max (I1KXV2), A. thaliana (O24646), S. moellendorffii (D8RQ04), C. reinhardtii (A8IM85), and O. lucimarinus (A4RRH6). (B) Alignment of ZIP63 orthologues in Z. mays (B4FJ00), S. bicolor (C5WX70), O. sativa (Q7X9A8), A. thaliana (B9DGI8) and P. patens (A9TD07). The number of cell types is given between parentheses on the left side of the sequence. The protein length is given on the right side of the sequences. Sequences are represented by color coded bars representing predicted disorder: disordered residues (red), ordered residues (grey), and alignment gaps (white). Typical bZIP domain (black box), C3HC4 zinc-finger type domain (black star), and COP1 interacting domain (black triangle) are shown. (C) Maximum likelihood trees of HY5 (left panel) and ZIP63 (right panel) sequences used in the alignment. The specie name and the corresponding number of cell types are given in the tree.
<sc>Fig</sc>. 7.
Fig. 7.
—Comparison of orthologues from zinc finger TF families. (A) Alignment of FZF orthologues in Z. mays (B6TYD5), O. sativa (Q5Z8K9), A. thaliana (Q9ZQ18), P. patens (A9RG33), S. moellendorffii (D8RTH9), V. carteri (D8UB30) and C. reinhardtii (A8JD88). (B) Alignment of GATA9 orthologues in Z. mays (K7VQ40), M. truncatula (G7LFY2), O. sativa (Q6F2Z7) A. thaliana (O82632) and O. lucimarinus (A4RXG3). The number of cell types is given between parentheses on the left side of the sequence. The protein length is given on the right side of the sequences. Sequences are represented by color coded bars representing predicted disorder: disordered residues (red), ordered residues (grey), and alignment gaps (white). Typical C2H2 and zinc-finger GATA-type domains (black box) are shown. (C) Maximum likelihood trees of the ZFZ (left panel) and GATA9 (right panel) sequences used in the alignments. The specie name and the corresponding number of cell types are given in the tree.
<sc>Fig</sc>. 8.
Fig. 8.
—Comparison of orthologues from the E2F/DP TF family. (A) Scatter plot of fraction of disordered residues/total residues (x axis) versus the Log10 of the number of different cell types (y axis) in plants: Z. mays (B4FB61), O. sativa (Q5QL93), A. thaliana (Q9FV71), P. patens (A9RQX0), S. moellendorffii (D8TCC4), V. carteri (Vocar.0001s0396.1), M. pusilla (C1MLR6) and O. tauri (A4RSR6) and animals: M. musculus (P56931), X. tropicalis (F6VW96), D. rerio (A5WUE8), D. melanogaster (O77051) and C. elegans (G5EF11). (B) Alignment of E2F1 orthologues. The number of cell types is given between parentheses on the left side of the sequence. The protein length is given on the right side of the sequences. Sequences are represented by color coded bars representing predicted disorder: disordered residues (red), ordered residues (grey), and alignment gaps (white). Typical DNA-binding and coiled coil domains (black box), cyclin A/CDK2 binding (black star) and retinoblastoma protein binding (black triangle) are shown. (C) The maximum likelihood tree of the E2F1 sequences used in the alignments. The specie name and the corresponding number of cell types are given in the tree.
<sc>Fig</sc>. 9.
Fig. 9.
—Bar-plot of the fraction of identical IDRs (dark red), similar IDRs (pink) and variable IDRs (blue) in chlorophyte, bryophyte and angiosperm plants, and invertebrate and vertebrate animals. The data represent the average of nine groups of orthologues in plants and three groups of orthologues in animals.
<sc>Fig</sc>. 10.
Fig. 10.
—Structure of human E2F1 and DP1 proteins in the monomer and heterodimer states. X-ray crystallographic of partial human E2F1-DP1 heterodimer (pdb 2AZE; Rubin et al. 2005) is shown as cartoon. The E2F1 (198–301 residues in pink) and DP1 (196–350 residues in blue) proteins are shown. Predicted disordered residues in monomers by PONDR VLS2b are shown in red line. The three-dimensional cartoons were drawn using PyMol 1.4.1 (Schrodinger LLC).

Similar articles

Cited by

References

    1. Anderson-Sprecher R. 1994. Model comparisons and R. Am Stat. 48 (2):p113–117.
    1. Babu MM. 2016. The contribution of intrinsically disordered regions to protein function, cellular complexity, and human disease. Biochem Soc Trans. 44:1185–1200. - PMC - PubMed
    1. Bell G, Mooers AO.. 1997. Size and complexity among multicellular organisms. Biol J Linnean Soc. 60:345–363.
    1. Bellay J, et al.2011. Bringing order to protein disorder through comparative genomics and genetic interactions. Genome Biol. 12:R14. - PMC - PubMed
    1. Berardini TZ, et al.2004. Functional annotation of the Arabidopsis genome using controlled vocabularies. Plant Physiol. 135:1–11. - PMC - PubMed

Publication types

Substances

LinkOut - more resources