Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2001 Dec;10(12):2460-9.
doi: 10.1110/ps.14401.

Motif-based fold assignment

Affiliations

Motif-based fold assignment

L Salwinski et al. Protein Sci. 2001 Dec.

Abstract

Conventional fold recognition techniques rely mainly on the analysis of the entire sequence of a protein. We present an MBA method to improve performance of any conventional sequence-based fold assignment. The method uses sequence motifs, such as those defined in the Prosite database, and the SwissProt annotation of the fold library. When combined with a simple SDP method, the coverage of MBA is comparable to the results obtained with PSI-BLAST. However, the set of the MBA predictions is significantly different from that of PSI-BLAST, leading to a 40% increase of the coverage for the combined MBA/PSI-BLAST method. The MBA approach can be easily adopted to include the results of sequence-independent function prediction methods and alternative motif and annotation databases. The method is available through the web server localized at http://www.doe-mbi.ucla.edu/mba.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Flowchart of the MBA method. Conventional fold assignment methods (solid lines) compare the entire sequence (or multiple sequence alignment) of the probe to the sequences (or multiple sequence alignments) or structures of the folds in a fold library. MBA (dashed lines) uses information present in the occurrences of motifs in the probe sequence and target annotation to combine it with a conventional sequence/structure score.
Fig. 2.
Fig. 2.
(a) Frequency of the SFM scores for all continuous protein domains defined in the CATH database (Orengo et al. 1999) and also found in the SwissProt database. Notice that annotation filtering removes a large number of uncorrelated domain–motif pairs having scores SFM ≈ 0. Unfiltered (solid circles) and annotation-filtered (open squares, open circles) motif–fold pairs using CMK = 0.25 and 1.0, respectively. (b) Frequency of the SMK scores for all protein sequences present in the SwissProt database (Bairoch and Apweiler 2000; release 39, 80,000 sequences). Notice that, apart from the vast majority of the uncorrelated motif–keyword pairs, there is also a small subset of strongly correlated pairs for which SMK ≫ 0. It constitutes ∼10%–15% of the total number of sequence motif–keyword pairs.
Fig. 3.
Fig. 3.
Performance of the MBA method, showing both accuracy of the assignment and the percentage of the coverage of the test set of CATH domains (see Materials and Methods), as compared with the SDP method (Fischer and Eisenberg 1996). Notice that the annotation-filtered version of MBA performs at least as well as the SDP method. The performance of the MBA method is parametrized by CFM, (solid squares) and CMK (open squares). Performance of the SDP method parametrized by Z score (+) is shown as a reference.
Fig. 4.
Fig. 4.
The number of domains in the test set (see Materials and Methods) that are correctly assigned by the MBA method but cannot be identified by PSI-BLAST as a function of CFM (solid squares) and CMK (open squares). Compare those to 726 domains that can be identified in a PSI-BLAST search.
Fig. 5.
Fig. 5.
(a) The performance of the MBA method using the combined motif and sequence scoring (equation 4). The accuracy versus coverage (see Materials and Methods) curve (solid circles) is parametrized by 0 < α < 1 for CFM = 0.25. Additional gain in accuracy can be obtained by also applying annotation filtering 0CMK < 6 (open circles). (b) The cumulative performance of PSI-BLAST and MBA methods using the combined motif and sequence scoring (equation 4). The accuracy versus coverage curve (solid circles) is parametrized by α (equation 4) for CFM = 0.25. Additional gain in accuracy can be obtained by also applying annotation filtering 0CMK < 6 (open circles). α parameter changes are between 0 and 1 along the closed symbols lines. CMK changes along open symbols lines for a fixed value of α.

Similar articles

Cited by

References

    1. Altschul, S.F., Madden, T.L., Schaeffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25 3389–3402. - PMC - PubMed
    1. Andrade, M., Casari, G., de Daruvar, A., Sander, C., Schneider, R., Tamames, J., Valencia, A., and Ouzounis, C. 1997. Sequence analysis of the Methanococcus jannaschii genome and the prediction of protein function. Comput. Appl. Biosci. 13 481–483. - PubMed
    1. Andrade, M.A., Brown, N.P., Leroy, C., Hoersch, S., de Daruvar, A., Reich, C., Franchini, A., Tamames, J., Valencia, A., Ouzounis, C., and Sander, C. 1999. Automated genome sequence analysis and annotation. Bioinformatics 15 391–412. - PubMed
    1. Bairoch, A. and Apweiler, R. 2000. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28 45–48. - PMC - PubMed
    1. Baxevanis, A.D. 2000. The molecular biology database collection: An online compilation of relevant database resources. Nucleic Acids Res. 28 1–7. - PMC - PubMed

Publication types