Development of an accurate classification system of proteins into structured and unstructured regions that uncovers novel structural domains: its application to human transcription factors
- PMID: 19402914
- PMCID: PMC2687452
- DOI: 10.1186/1472-6807-9-26
Development of an accurate classification system of proteins into structured and unstructured regions that uncovers novel structural domains: its application to human transcription factors
Abstract
Background: In addition to structural domains, most eukaryotic proteins possess intrinsically disordered (ID) regions. Although ID regions often play important functional roles, their accurate identification is difficult. As human transcription factors (TFs) constitute a typical group of proteins with long ID regions, we regarded them as a model of all proteins and attempted to accurately classify TFs into structural domains and ID regions. Although an extremely high fraction of ID regions besides DNA binding and/or other domains was detected in human TFs in our previous investigation, 20% of the residues were left unassigned. In this report, we exploit the generally higher sequence divergence in ID regions than in structural regions to completely divide proteins into structural domains and ID regions.
Results: The new dichotomic system first identifies domains of known structures, followed by assignment of structural domains and ID regions with a combination of pre-existing tools and a newly developed program based on sequence divergence, taking un-aligned regions into consideration. The system was found to be highly accurate: its application to a set of proteins with experimentally verified ID regions had an error rate as low as 2%. Application of this system to human TFs (401 proteins) showed that 38% of the residues were in structural domains, while 62% were in ID regions. The preponderance of ID regions makes a sharp contrast to TFs of Escherichia coli (229 proteins), in which only 5% fell in ID regions. The method also revealed that 4.0% and 11.8% of the total length in human and E. coli TFs, respectively, are comprised of structural domains whose structures have not been determined.
Conclusion: The present system verifies that sequence divergence including information of unaligned regions is a good indicator of ID regions. The system for the first time estimates the complete fractioning of structured/un-structured regions in human TFs, also revealing structural domains without homology to known structures. These predicted novel structural domains are good targets of structural genomics. When applied to other proteins, the system is expected to uncover more novel structural domains.
Figures




Similar articles
-
Binary classification of protein molecules into intrinsically disordered and ordered segments.BMC Struct Biol. 2011 Jun 22;11:29. doi: 10.1186/1472-6807-11-29. BMC Struct Biol. 2011. PMID: 21693062 Free PMC article.
-
Human transcription factors contain a high fraction of intrinsically disordered regions essential for transcriptional regulation.J Mol Biol. 2006 Jun 16;359(4):1137-49. doi: 10.1016/j.jmb.2006.04.016. Epub 2006 Apr 25. J Mol Biol. 2006. PMID: 16697407
-
Eukaryotic transcription factors: paradigms of protein intrinsic disorder.Biochem J. 2017 Jul 12;474(15):2509-2532. doi: 10.1042/BCJ20160631. Biochem J. 2017. PMID: 28701416 Review.
-
PURE: a webserver for the prediction of domains in unassigned regions in proteins.BMC Bioinformatics. 2008 Jun 14;9:281. doi: 10.1186/1471-2105-9-281. BMC Bioinformatics. 2008. PMID: 18554415 Free PMC article.
-
Role of intrinsically disordered protein regions/domains in transcriptional regulation.Life Sci. 2009 Feb 13;84(7-8):189-93. doi: 10.1016/j.lfs.2008.12.002. Epub 2008 Dec 24. Life Sci. 2009. PMID: 19109982 Review.
Cited by
-
A Method for Systematic Assessment of Intrinsically Disordered Protein Regions by NMR.Int J Mol Sci. 2015 Jul 10;16(7):15743-60. doi: 10.3390/ijms160715743. Int J Mol Sci. 2015. PMID: 26184172 Free PMC article.
-
The Autophagy Database: an all-inclusive information resource on autophagy that provides nourishment for research.Nucleic Acids Res. 2011 Jan;39(Database issue):D986-90. doi: 10.1093/nar/gkq995. Epub 2010 Oct 23. Nucleic Acids Res. 2011. PMID: 20972215 Free PMC article.
-
Binary classification of protein molecules into intrinsically disordered and ordered segments.BMC Struct Biol. 2011 Jun 22;11:29. doi: 10.1186/1472-6807-11-29. BMC Struct Biol. 2011. PMID: 21693062 Free PMC article.
-
Intrinsically disordered proteins in cellular signalling and regulation.Nat Rev Mol Cell Biol. 2015 Jan;16(1):18-29. doi: 10.1038/nrm3920. Nat Rev Mol Cell Biol. 2015. PMID: 25531225 Free PMC article. Review.
-
Amyotrophic Lateral Sclerosis Type 20 - In Silico Analysis and Molecular Dynamics Simulation of hnRNPA1.PLoS One. 2016 Jul 14;11(7):e0158939. doi: 10.1371/journal.pone.0158939. eCollection 2016. PLoS One. 2016. PMID: 27414033 Free PMC article.
References
-
- Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, Obradovic Z. Intrinsic disorder and protein function. Biochemistry. 2002;41:6573–6582. - PubMed
-
- Wright PE, Dyson HJ. Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. J Mol Biol. 1999;293:321–331. - PubMed
-
- Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol. 2004;337:635–645. - PubMed
-
- Dyson HJ, Wright PE. Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol. 2005;6:197–208. - PubMed
-
- Iakoucheva LM, Brown CJ, Lawson JD, Obradovic Z, Dunker AK. Intrinsic disorder in cell-signaling and cancer-associated proteins. J Mol Biol. 2002;323:573–584. - PubMed
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Research Materials