Automated search of natively folded protein fragments for high-throughput structure determination in structural genomics
- PMID: 11206052
- PMCID: PMC2144534
- DOI: 10.1110/ps.9.12.2313
Automated search of natively folded protein fragments for high-throughput structure determination in structural genomics
Abstract
Structural genomic projects envision almost routine protein structure determinations, which are currently imaginable only for small proteins with molecular weights below 25,000 Da. For larger proteins, structural insight can be obtained by breaking them into small segments of amino acid sequences that can fold into native structures, even when isolated from the rest of the protein. Such segments are autonomously folding units (AFU) and have sizes suitable for fast structural analyses. Here, we propose to expand an intuitive procedure often employed for identifying biologically important domains to an automatic method for detecting putative folded protein fragments. The procedure is based on the recognition that large proteins can be regarded as a combination of independent domains conserved among diverse organisms. We thus have developed a program that reorganizes the output of BLAST searches and detects regions with a large number of similar sequences. To automate the detection process, it is reduced to a simple geometrical problem of recognizing rectangular shaped elevations in a graph that plots the number of similar sequences at each residue of a query sequence. We used our program to quantitatively corroborate the premise that segments with conserved sequences correspond to domains that fold into native structures. We applied our program to a test data set composed of 99 amino acid sequences containing 150 segments with structures listed in the Protein Data Bank, and thus known to fold into native structures. Overall, the fragments identified by our program have an almost 50% probability of forming a native structure, and comparable results are observed with sequences containing domain linkers classified in SCOP. Furthermore, we verified that our program identifies AFU in libraries from various organisms, and we found a significant number of AFU candidates for structural analysis, covering an estimated 5 to 20% of the genomic databases. Altogether, these results argue that methods based on sequence similarity can be useful for dissecting large proteins into small autonomously folding domains, and such methods may provide an efficient support to structural genomics projects.
Similar articles
-
Identification of putative domain linkers by a neural network - application to a large sequence database.BMC Bioinformatics. 2006 Jun 27;7:323. doi: 10.1186/1471-2105-7-323. BMC Bioinformatics. 2006. PMID: 16800897 Free PMC article.
-
Fast identification of folded human protein domains expressed in E. coli suitable for structural analysis.BMC Struct Biol. 2004 Mar 8;4:4. doi: 10.1186/1472-6807-4-4. BMC Struct Biol. 2004. PMID: 15113422 Free PMC article.
-
The automatic detection of known beta-propeller structural motifs from protein tertiary structure.Int J Biol Macromol. 2005 Aug;36(3):176-83. doi: 10.1016/j.ijbiomac.2005.05.007. Int J Biol Macromol. 2005. PMID: 16039708
-
Engineering by homologous recombination: exploring sequence and function within a conserved fold.Curr Opin Struct Biol. 2007 Aug;17(4):454-9. doi: 10.1016/j.sbi.2007.08.005. Epub 2007 Sep 19. Curr Opin Struct Biol. 2007. PMID: 17884462 Review.
-
Structural genomics and its importance for gene function analysis.Nat Biotechnol. 2000 Mar;18(3):283-7. doi: 10.1038/73723. Nat Biotechnol. 2000. PMID: 10700142 Review.
Cited by
-
Characterization and prediction of linker sequences of multi-domain proteins by a neural network.J Struct Funct Genomics. 2002;2(1):37-51. doi: 10.1023/a:1014418700858. J Struct Funct Genomics. 2002. PMID: 12836673
-
ThreaDomEx: a unified platform for predicting continuous and discontinuous protein domains by multiple-threading and segment assembly.Nucleic Acids Res. 2017 Jul 3;45(W1):W400-W407. doi: 10.1093/nar/gkx410. Nucleic Acids Res. 2017. PMID: 28498994 Free PMC article.
-
DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning.BMC Bioinformatics. 2011 Feb 1;12:43. doi: 10.1186/1471-2105-12-43. BMC Bioinformatics. 2011. PMID: 21284866 Free PMC article.
-
Identifying foldable regions in protein sequence from the hydrophobic signal.Nucleic Acids Res. 2008 Feb;36(2):578-88. doi: 10.1093/nar/gkm1070. Epub 2007 Dec 1. Nucleic Acids Res. 2008. PMID: 18056079 Free PMC article.
-
Development of an accurate classification system of proteins into structured and unstructured regions that uncovers novel structural domains: its application to human transcription factors.BMC Struct Biol. 2009 Apr 30;9:26. doi: 10.1186/1472-6807-9-26. BMC Struct Biol. 2009. PMID: 19402914 Free PMC article.
References
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials