Strategies for high-throughput comparative modeling: applications to leverage analysis in structural genomics and protein family organization
- PMID: 17154423
- DOI: 10.1002/prot.21191
Strategies for high-throughput comparative modeling: applications to leverage analysis in structural genomics and protein family organization
Abstract
The technological breakthroughs in structural genomics were designed to facilitate the solution of a sufficient number of structures, so that as many protein sequences as possible can be structurally characterized with the aid of comparative modeling. The leverage of a solved structure is the number and quality of the models that can be produced using the structure as a template for modeling and may be viewed as the "currency" with which the success of a structural genomics endeavor can be measured. Moreover, the models obtained in this way should be valuable to all biologists. To this end, at the Northeast Structural Genomics Consortium (NESG), a modular computational pipeline for automated high-throughput leverage analysis was devised and used to assess the leverage of the 186 unique NESG structures solved during the first phase of the Protein Structure Initiative (January 2000 to July 2005). Here, the results of this analysis are presented. The number of sequences in the nonredundant protein sequence database covered by quality models produced by the pipeline is approximately 39,000, so that the average leverage is approximately 210 models per structure. Interestingly, only 7900 of these models fulfill the stringent modeling criterion of being at least 30% sequence-identical to the corresponding NESG structures. This study shows how high-throughput modeling increases the efficiency of structure determination efforts by providing enhanced coverage of protein structure space. In addition, the approach is useful in refining the boundaries of structural domains within larger protein sequences, subclassifying sequence diverse protein families, and defining structure-based strategies specific to a particular family.
(c) 2006 Wiley-Liss, Inc.
Similar articles
-
Evaluating protein structures determined by structural genomics consortia.Proteins. 2007 Mar 1;66(4):778-95. doi: 10.1002/prot.21165. Proteins. 2007. PMID: 17186527
-
Progress of structural genomics initiatives: an analysis of solved target structures.J Mol Biol. 2005 May 20;348(5):1235-60. doi: 10.1016/j.jmb.2005.03.037. Epub 2005 Apr 2. J Mol Biol. 2005. PMID: 15854658
-
An overview of structural genomics.Nat Struct Biol. 2000 Nov;7 Suppl:932-4. doi: 10.1038/80697. Nat Struct Biol. 2000. PMID: 11103991
-
Target selection for structural genomics.Nat Struct Biol. 2000 Nov;7 Suppl:967-9. doi: 10.1038/80747. Nat Struct Biol. 2000. PMID: 11104002 Review.
-
A tour of structural genomics.Nat Rev Genet. 2001 Oct;2(10):801-9. doi: 10.1038/35093574. Nat Rev Genet. 2001. PMID: 11584296 Review.
Cited by
-
Identification and structural characterization of FYVE domain-containing proteins of Arabidopsis thaliana.BMC Plant Biol. 2010 Aug 2;10:157. doi: 10.1186/1471-2229-10-157. BMC Plant Biol. 2010. PMID: 20678208 Free PMC article.
-
Internal organization of large protein families: relationship between the sequence, structure, and function-based clustering.Proteins. 2011 Aug;79(8):2389-402. doi: 10.1002/prot.23049. Epub 2011 May 31. Proteins. 2011. PMID: 21671455 Free PMC article.
-
The Protein Model Portal.J Struct Funct Genomics. 2009 Mar;10(1):1-8. doi: 10.1007/s10969-008-9048-5. Epub 2008 Nov 27. J Struct Funct Genomics. 2009. PMID: 19037750 Free PMC article.
-
Predicting peptide-mediated interactions on a genome-wide scale.PLoS Comput Biol. 2015 May 4;11(5):e1004248. doi: 10.1371/journal.pcbi.1004248. eCollection 2015 May. PLoS Comput Biol. 2015. PMID: 25938916 Free PMC article.
-
The Protein Structure Initiative: achievements and visions for the future.F1000 Biol Rep. 2012;4:7. doi: 10.3410/B4-7. Epub 2012 Apr 2. F1000 Biol Rep. 2012. PMID: 22500193 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources