Comparative Study

. 2008 Jul 16:8:31.

doi: 10.1186/1472-6807-8-31.

Systematic analysis of the effect of multiple templates on the accuracy of comparative models of protein structure

Suvobrata Chakravarty¹, Sucheta Godbole, Bing Zhang, Seth Berger, Roberto Sanchez

Affiliations

PMID: 18631402
PMCID: PMC2483983
DOI: 10.1186/1472-6807-8-31

Comparative Study

Systematic analysis of the effect of multiple templates on the accuracy of comparative models of protein structure

Suvobrata Chakravarty et al. BMC Struct Biol. 2008.

. 2008 Jul 16:8:31.

doi: 10.1186/1472-6807-8-31.

Authors

Suvobrata Chakravarty¹, Sucheta Godbole, Bing Zhang, Seth Berger, Roberto Sanchez

Affiliation

¹ Department of Structural and Chemical Biology, Mount Sinai School of Medicine, 1425 Madison Avenue, New York, NY 10029, USA. suvobrata.chakravarty@mssm.edu

PMID: 18631402
PMCID: PMC2483983
DOI: 10.1186/1472-6807-8-31

Abstract

Background: Although multiple templates are frequently used in comparative modeling, the effect of inclusion of additional template(s) on model accuracy (when compared to that of corresponding single-template based models) is not clear. To address this, we systematically analyze two-template models, the simplest case of multiple-template modeling. For an existing target-template pair (single-template modeling), a two-template based model of the target sequence is constructed by including an additional template without changing the original alignment to measure the effect of the second template on model accuracy.

Results: Even though in a large number of cases a two-template model showed higher accuracy than the corresponding one-template model, over the entire dataset only a marginal improvement was observed on average, as there were many cases where no change or the reverse change was observed. The increase in accuracy due to the structural complementarity of the templates increases at higher alignment accuracies. The combination of templates showing the highest potential for improvement is that where both templates share similar and low (less than 30%) sequence identity with the target, as well as low sequence identity with each other. The structural similarity between the templates also helps in identifying template combinations having a higher chance of resulting in an improved model.

Conclusion: Inclusion of additional template(s) does not necessarily improve model quality, but there are distinct combinations of the two templates, which can be selected a priori, that tend to show improvement in model quality over the single template model. The benefit derived from the structural complementarity is dependent on the accuracy of the modeling alignment. The study helps to explain the observation that a careful selection of templates together with an accurate target:template alignment are necessary to the benefit from using multiple templates in comparative modeling and provides guidelines to maximize the benefit from using multiple templates. This enables formulation of simple template selection rules to rank targets of a protein family in the context of structural genomics.

PubMed Disclaimer

Figures

**Figure 1**
**Structural complementarity of templates**. **(A)** Absence of structural information from Template1 for segment involving residues 4–9 in the target is complemented by an equivalent segment in Template2. **(B)** A segment of the Target can be structurally closer to Template2 than Template1. Template1 refers to the template with the higher sequence identity (see Methods).

**Figure 2**
**Strategy to deconvolute the alignment accuracy and structural complementarity effects on two-template model accuracy**. **(A)** A pair of models is built alternatively on the same modeling alignment in presence of one (bottom) and both templates (top). The Target segment corresponding to the box has no structural information in absence of Template2. ALN stands for alignment type (SEQuence or STRucture). **(B)** The total improvement of multiple template models over single template models is a combination of decreasing alignment errors and structural complementarity.

**Figure 3**
**Accuracy of multiple-template models**. **(A)** Comparison of overall accuracy between single and two-template SEQ models, SEQ.2.2 (large black filled circle) and SEQ.2.1 (small gray circle). **(B)** Comparison between single and two-template STR models, STR.2.2 (filled circle) and STR.2.1 (small gray circle). The lower sequence identity region is highlighted in the inset. Because of the large number of cases analyzed (> 10,000 models per curve) even the small differences shown here are statistically significant based on the Student t test. Thus, for clarity no error bars are shown. **(C)** Distribution of difference in RMSD between one-template (STR.2.1) and two-template models (STR.2.2) built using structure-based alignments (RMSD_STR.2.1- RMSD_STR.2.2). Only models with S1 ≤ 40% are shown here.

**Figure 4**
**Relationship between structural complementarity and alignment accuracy**. **(A)** The structural complementarity, ΔRMSD_ALN= (RMSD_ALN.2.1-RMSD_ALN.2.2), of SEQ models (black filled circles) and STR models (empty circles) is shown as a function of SEQ alignment accuracy. The STR curve represents the maximum achievable structural complementarity for each alignment accuracy bin. **(B)** Difference between the observed structural complementarity in SEQ models (ΔRMSD_SEQ) and maximum achievable structural complementarity (ΔRMSD_STR) as a function of SEQ alignment accuracy.

**Figure 5**
**Effect of the relative Target:Template and Template1:Template2 sequence similarity on two-template model accuracy**. **(A)** Definition of sequence similarities between Target, Template1, and Template2. **(B)** Difference in RMSD for models built using structure-based alignments (RMSD_STR.2.1- RMSD_STR.2.2) as a function of Target:Template1 sequence identity (S1) for different ranges of [S1–S2]. The colored circles green, yellow, blue to red are in increasing order of [S1–S2]. The absence of data at lower sequence identity (for yellow, blue and red) is due to the fact that for large values [S1 – S2], small S1 is not possible. **(C)** Difference in RMSD for models built using structure-based alignments (RMSD_STR.2.1- RMSD_STR.2.2) as a function of Target:Template1 sequence identity (S1) for different ranges of S3. The colored circles, green, blue to red, are in the increasing order of S3. Only models with S1 similar to S2 are shown here. The absence of data points (green and blue) for higher sequence identity is due to the fact that certain combinations of S1, S2, and S3 are not possible.

**Figure 6**
**Proportion of Good/Bad models as a function of S3 and RMSD between the templates**. Accuracy is measured by ΔRMSD defined as (RMSD_STR.2.1- RMSD_STR.2.2). Models are defined as: good: ΔRMSD ≥ 1 Å; bad: ΔRMSD ≤ -1 Å; or neutral: 1 Å > Δ RMSD > -1 Å. In all plots only models based on template combinations for which S1–S2 is less than 5% are included. **(A)** The ratio between the number of Good and Bad STR models as a function of S3, the sequence identity between the templates. **(B)** The good/bad ratio as a function of the RMSD between the two templates. **(C)** The good/bad ratio as a function of the RMSD between the two templates; in these plots the additional restriction of S3 < 30% is imposed on all selected models with the aim of showing the complementarity between S3 and template RMSD selection.

**Figure 7**
**Distribution of accuracy differences between one-template and two-template models for a selected subset**. **(A)** Difference in RMSD (ΔRMSD) for models built using structure-based alignments (RMSD_STR.2.1- RMSD_STR.2.2). Only models with S1 – S2 less than 5%, S1 < 30%, S3 < 30% and template RMSD between 3.5 and 5.5 Å are shown here. The dark bars correspond to Good models (see figure 6 legend), the empty bars to Bad models, the light bars to Neutral models. **(B)** Fraction of Neutral (unchanged), Good and Bad models in the dataset before and after applying the template selection criteria described above.

**Figure 8**
**Relationship between structural complementarity and alignment accuracy in the selected subset**. The selected models correspond to those described in Figure 7. **(A)** The structural complementarity, ΔRMSD_SEQ= (RMSD_SEQ.2.1-RMSD_SEQ.2.2), of selected SEQ models (empty circles) is shown as a function of SEQ alignment accuracy. The curve for all SEQ models from Figure 4A (black circles) is shown for comparison. **(B)** Difference between observed structural complementarity in SEQ models (ΔRMSD_SEQ) and maximum achievable structural complementarity (ΔRMSD_STR) as a function of SEQ alignment accuracy is shown for the selected models (empty circles) and for all models (black circles).

See this image and copyright information in PMC

References

1. Sanchez R, Sali A. Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. Proc Natl Acad Sci U S A. 1998;95:13597–13602. doi: 10.1073/pnas.95.23.13597. - DOI - PMC - PubMed
1. Sanchez R, Pieper U, Melo F, Eswar N, Marti-Renom MA, Madhusudhan MS, Mirkovic N, Sali A. Protein structure modeling for structural genomics. Nat Struct Biol. 2000;7 Suppl:986–990. doi: 10.1038/80776. - DOI - PubMed
1. Stevens RC, Yokoyama S, Wilson IA. Global efforts in structural genomics. Science. 2001;294:89–92. doi: 10.1126/science.1066011. - DOI - PubMed
1. Tramontano A, Morea V. Assessment of homology-based predictions in CASP5. Proteins. 2003;53 Suppl 6:352–368. doi: 10.1002/prot.10543. - DOI - PubMed
1. Chakravarty S, Wang L, Sanchez R. Accuracy of structure-derived properties in simple comparative models of protein structures. Nucleic Acids Res. 2005;33:244–259. doi: 10.1093/nar/gki162. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Systematic analysis of the effect of multiple templates on the accuracy of comparative models of protein structure

Affiliation

Systematic analysis of the effect of multiple templates on the accuracy of comparative models of protein structure

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources