. 2001 Dec;10(12):2460-9.

doi: 10.1110/ps.14401.

Motif-based fold assignment

L Salwinski¹, D Eisenberg

Affiliations

PMID: 11714913
PMCID: PMC2374048
DOI: 10.1110/ps.14401

Motif-based fold assignment

L Salwinski et al. Protein Sci. 2001 Dec.

. 2001 Dec;10(12):2460-9.

doi: 10.1110/ps.14401.

Authors

L Salwinski¹, D Eisenberg

Affiliation

¹ Department of Chemistry, UCLA-DOE Laboratory of Structural Biology and Molecular Medicine, UCLA, Los Angeles, California 90095-1570, USA.

PMID: 11714913
PMCID: PMC2374048
DOI: 10.1110/ps.14401

Abstract

Conventional fold recognition techniques rely mainly on the analysis of the entire sequence of a protein. We present an MBA method to improve performance of any conventional sequence-based fold assignment. The method uses sequence motifs, such as those defined in the Prosite database, and the SwissProt annotation of the fold library. When combined with a simple SDP method, the coverage of MBA is comparable to the results obtained with PSI-BLAST. However, the set of the MBA predictions is significantly different from that of PSI-BLAST, leading to a 40% increase of the coverage for the combined MBA/PSI-BLAST method. The MBA approach can be easily adopted to include the results of sequence-independent function prediction methods and alternative motif and annotation databases. The method is available through the web server localized at http://www.doe-mbi.ucla.edu/mba.

PubMed Disclaimer

Figures

**Fig. 1.**
Flowchart of the MBA method. Conventional fold assignment methods (solid lines) compare the entire sequence (or multiple sequence alignment) of the probe to the sequences (or multiple sequence alignments) or structures of the folds in a fold library. MBA (dashed lines) uses information present in the occurrences of motifs in the probe sequence and target annotation to combine it with a conventional sequence/structure score.

**Fig. 2.**
(a) Frequency of the *S_FM* scores for all continuous protein domains defined in the CATH database (Orengo et al. 1999) and also found in the SwissProt database. Notice that annotation filtering removes a large number of uncorrelated domain–motif pairs having scores *S_FM* ≈ 0. Unfiltered (solid circles) and annotation-filtered (open squares, open circles) motif–fold pairs using *C_MK* = 0.25 and 1.0, respectively. (b) Frequency of the *S_MK* scores for all protein sequences present in the SwissProt database (Bairoch and Apweiler 2000; release 39, 80,000 sequences). Notice that, apart from the vast majority of the uncorrelated motif–keyword pairs, there is also a small subset of strongly correlated pairs for which *S_MK* ≫ 0. It constitutes ∼10%–15% of the total number of sequence motif–keyword pairs.

**Fig. 3.**
Performance of the MBA method, showing both accuracy of the assignment and the percentage of the coverage of the test set of CATH domains (see Materials and Methods), as compared with the SDP method (Fischer and Eisenberg 1996). Notice that the annotation-filtered version of MBA performs at least as well as the SDP method. The performance of the MBA method is parametrized by *C_FM*, (solid squares) and *C_MK* (open squares). Performance of the SDP method parametrized by Z score (+) is shown as a reference.

**Fig. 4.**
The number of domains in the test set (see Materials and Methods) that are correctly assigned by the MBA method but cannot be identified by PSI-BLAST as a function of *C_FM* (solid squares) and *C_MK* (open squares). Compare those to 726 domains that can be identified in a PSI-BLAST search.

**Fig. 5.**
(a) The performance of the MBA method using the combined motif and sequence scoring (equation 4). The accuracy versus coverage (see Materials and Methods) curve (solid circles) is parametrized by 0 < α < 1 for *C_FM* = 0.25. Additional gain in accuracy can be obtained by also applying annotation filtering 0C_MK < 6 (open circles). (b) The cumulative performance of PSI-BLAST and MBA methods using the combined motif and sequence scoring (equation 4). The accuracy versus coverage curve (solid circles) is parametrized by α (equation 4) for *C_FM* = 0.25. Additional gain in accuracy can be obtained by also applying annotation filtering 0C_MK < 6 (open circles). α parameter changes are between 0 and 1 along the closed symbols lines. *C_MK* changes along open symbols lines for a fixed value of α.

See this image and copyright information in PMC

Cited by

DescFold: a web server for protein fold recognition.
Yan RX, Si JN, Wang C, Zhang Z. Yan RX, et al. BMC Bioinformatics. 2009 Dec 14;10:416. doi: 10.1186/1471-2105-10-416. BMC Bioinformatics. 2009. PMID: 20003426 Free PMC article.
Descriptor-based protein remote homology identification.
Zhang Z, Kochhar S, Grigorov MG. Zhang Z, et al. Protein Sci. 2005 Feb;14(2):431-44. doi: 10.1110/ps.041035505. Epub 2005 Jan 4. Protein Sci. 2005. PMID: 15632283 Free PMC article.
Identification of GATC- and CCGG-recognizing Type II REases and their putative specificity-determining positions using Scan2S--a novel motif scan algorithm with optional secondary structure constraints.
Niv MY, Skrabanek L, Roberts RJ, Scheraga HA, Weinstein H. Niv MY, et al. Proteins. 2008 May 1;71(2):631-40. doi: 10.1002/prot.21777. Proteins. 2008. PMID: 17972284 Free PMC article.
The directional atomic solvation energy: an atom-based potential for the assignment of protein sequences to known folds.
Mallick P, Weiss R, Eisenberg D. Mallick P, et al. Proc Natl Acad Sci U S A. 2002 Dec 10;99(25):16041-6. doi: 10.1073/pnas.252626399. Epub 2002 Dec 2. Proc Natl Acad Sci U S A. 2002. PMID: 12461172 Free PMC article.
TIM-Finder: a new method for identifying TIM-barrel proteins.
Si JN, Yan RX, Wang C, Zhang Z, Su XD. Si JN, et al. BMC Struct Biol. 2009 Dec 14;9:73. doi: 10.1186/1472-6807-9-73. BMC Struct Biol. 2009. PMID: 20003393 Free PMC article.

References

1. Altschul, S.F., Madden, T.L., Schaeffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25 3389–3402. - PMC - PubMed
1. Andrade, M., Casari, G., de Daruvar, A., Sander, C., Schneider, R., Tamames, J., Valencia, A., and Ouzounis, C. 1997. Sequence analysis of the Methanococcus jannaschii genome and the prediction of protein function. Comput. Appl. Biosci. 13 481–483. - PubMed
1. Andrade, M.A., Brown, N.P., Leroy, C., Hoersch, S., de Daruvar, A., Reich, C., Franchini, A., Tamames, J., Valencia, A., Ouzounis, C., and Sander, C. 1999. Automated genome sequence analysis and annotation. Bioinformatics 15 391–412. - PubMed
1. Bairoch, A. and Apweiler, R. 2000. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28 45–48. - PMC - PubMed
1. Baxevanis, A.D. 2000. The molecular biology database collection: An online compilation of relevant database resources. Nucleic Acids Res. 28 1–7. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Motif-based fold assignment

Affiliation

Motif-based fold assignment

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials