Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Aug;72(2):547-56.
doi: 10.1002/prot.21945.

MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information

Affiliations

MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information

Sitao Wu et al. Proteins. 2008 Aug.

Abstract

We develop a new threading algorithm MUSTER by extending the previous sequence profile-profile alignment method, PPA. It combines various sequence and structure information into single-body terms which can be conveniently used in dynamic programming search: (1) sequence profiles; (2) secondary structures; (3) structure fragment profiles; (4) solvent accessibility; (5) dihedral torsion angles; (6) hydrophobic scoring matrix. The balance of the weighting parameters is optimized by a grading search based on the average TM-score of 111 training proteins which shows a better performance than using the conventional optimization methods based on the PROSUP database. The algorithm is tested on 500 nonhomologous proteins independent of the training sets. After removing the homologous templates with a sequence identity to the target >30%, in 224 cases, the first template alignment has the correct topology with a TM-score >0.5. Even with a more stringent cutoff by removing the templates with a sequence identity >20% or detectable by PSI-BLAST with an E-value <0.05, MUSTER is able to identify correct folds in 137 cases with the first model of TM-score >0.5. Dependent on the homology cutoffs, the average TM-score of the first threading alignments by MUSTER is 5.1-6.3% higher than that by PPA. This improvement is statistically significant by the Wilcoxon signed rank test with a P-value < 1.0 x 10(-13), which demonstrates the effect of additional structural information on the protein fold recognition. The MUSTER server is freely available to the academic community at http://zhang.bioinformatics.ku.edu/MUSTER.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of the full (Lfull) and partial (Lpartial) alignment lengths used to normalize the threading alignment score (Rscore). Symbols “-”,“.” and “:” indicate an unaligned gap, an aligned nonidentical residue pair and an aligned identical residue pair, respectively. The query and template sequences are taken from 1hroA (first 53 residues) and 155c_ (first 61 residues), respectively, as an illustrative example.
Figure 2
Figure 2
TM-scores of the top 50 threading alignments generated by MUSTER for each of 111 training proteins versus the Z-scores. The vertical line indicates a Z-score cutoff (= 7.5) to distinguish “Easy” and “Hard” targets and the horizontal line corresponds to TM-score = 0.5.
Figure 3
Figure 3
TM-score comparison between PPA and MUSTER for the first threading models of 500 nonhomologous testing proteins. Circles represent the models from the “Easy” targets and crosses indicate those from the “Hard” targets. (a) Homology Cutoff-1 excluding templates with sequence identity to targets >30%; (b) homology Cutoff-2 excluding templates with sequence identity >20% or detectable by PSI-BLAST with an E-value > 0.05.
Figure 4
Figure 4
The threading results for “1eq1A” by MUSTER and PPA. (a) The first threading model from the template “1aep_” by MUSTER; (b) The third threading model from the template “1aep_” by PPA. The upper part of the figure shows the superposition of 3D models to native structure. Thin line denotes Cα backbone of the native structure and thick line is that of threading models. Blue to red color runs from N- to C-terminus. The 3-D structures are plotted using PyMOL software. The lower part of the figure shows the 1D alignment of the secondary structure elements by MUSTER and PPA, respectively. The wave lines indicate the α-helix regions and the straight lines the coil regions. The black color represents the continuous regions with residues appearing and the green color indicates the gap regions.

References

    1. Tress M, Ezkurdia I, Grana O, Lopez G, Valencia A. Assessment of predictions submitted for the CASP6 comparative modeling category. Proteins. 2005;61 Suppl 7:27–45. - PubMed
    1. Wang G, Jin Y, Dunbrack RL., Jr Assessment of fold recognition predictions in CASP6. Proteins. 2005;61 Suppl 7:46–66. - PubMed
    1. Kopp J, Bordoli L, Battey JN, Kiefer F, Schwede T. Assessment of CASP7 predictions for template-based modeling targets. Proteins. 2007;69 Suppl 8:38–56. - PubMed
    1. Bowie JU, Luthy R, Eisenberg D. A method to identify protein sequences that fold into a known three-dimensional structure. Science. 1991;253:164–170. - PubMed
    1. Jones DT, Taylor WR, Thornton JM. A new approach to protein fold recognition. Nature. 1992;358:86–89. - PubMed

Publication types