Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Dec 14:10:416.
doi: 10.1186/1471-2105-10-416.

DescFold: a web server for protein fold recognition

Affiliations

DescFold: a web server for protein fold recognition

Ren-Xiang Yan et al. BMC Bioinformatics. .

Abstract

Background: Machine learning-based methods have been proven to be powerful in developing new fold recognition tools. In our previous work [Zhang, Kochhar and Grigorov (2005) Protein Science, 14: 431-444], a machine learning-based method called DescFold was established by using Support Vector Machines (SVMs) to combine the following four descriptors: a profile-sequence-alignment-based descriptor using Psi-blast e-values and bit scores, a sequence-profile-alignment-based descriptor using Rps-blast e-values and bit scores, a descriptor based on secondary structure element alignment (SSEA), and a descriptor based on the occurrence of PROSITE functional motifs. In this work, we focus on the improvement of DescFold by incorporating more powerful descriptors and setting up a user-friendly web server.

Results: In seeking more powerful descriptors, the profile-profile alignment score generated from the COMPASS algorithm was first considered as a new descriptor (i.e., PPA). When considering a profile-profile alignment between two proteins in the context of fold recognition, one protein is regarded as a template (i.e., its 3D structure is known). Instead of a sequence profile derived from a Psi-blast search, a structure-seeded profile for the template protein was generated by searching its structural neighbors with the assistance of the TM-align structural alignment algorithm. Moreover, the COMPASS algorithm was used again to derive a profile-structural-profile-alignment-based descriptor (i.e., PSPA). We trained and tested the new DescFold in a total of 1,835 highly diverse proteins extracted from the SCOP 1.73 version. When the PPA and PSPA descriptors were introduced, the new DescFold boosts the performance of fold recognition substantially. Using the SCOP_1.73_40% dataset as the fold library, the DescFold web server based on the trained SVM models was further constructed. To provide a large-scale test for the new DescFold, a stringent test set of 1,866 proteins were selected from the SCOP 1.75 version. At a less than 5% false positive rate control, the new DescFold is able to correctly recognize structural homologs at the fold level for nearly 46% test proteins. Additionally, we also benchmarked the DescFold method against several well-established fold recognition algorithms through the LiveBench targets and Lindahl dataset.

Conclusions: The new DescFold method was intensively benchmarked to have very competitive performance compared with some well-established fold recognition methods, suggesting that it can serve as a useful tool to assist in template-based protein structure prediction. The DescFold server is freely accessible at http://202.112.170.199/DescFold/index.html.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Performance of fold recognition using different descriptors. True positive instances versus false positive instances were used to examine the number of true positives out of 1,835 proteins identified by varying similarity scores.
Figure 2
Figure 2
Performance of remote homology identification using different descriptors. True positive rates versus false positive rates were used to examine the number of true positives out of 8,244 protein pairs identified by varying similarity scores.
Figure 3
Figure 3
Cartoon representation of two remote homologs (SCOP entries: d2a13a1 and d1hmsa_) successfully detected by DescFold. The structural alignment between d2a13a1 (red) and d1hmsa_ (green) was carried out by using CE [51]. The RMSD for 121 structurally aligned residues is 3.6 Å, and the CE Z-Score is 5.2.
Figure 4
Figure 4
Snapshot of the DescFold website. (A) The submission page of DescFold. (B) The result page of DescFold.
Figure 5
Figure 5
Performance of DescFold based on the SCOP_1.75_1866 test set. The performance was measured at the fold (A) and superfamily (B) levels, respectively.

Similar articles

Cited by

References

    1. Petrey D, Honig B. Protein structure prediction: inroads to biology. Mol Cell. 2005;20(6):811–819. doi: 10.1016/j.molcel.2005.12.005. - DOI - PubMed
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–410. - PubMed
    1. Pearson WR. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods in enzymology. 1990;183:63–98. full_text. - PubMed
    1. Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981;147(1):195–197. doi: 10.1016/0022-2836(81)90087-5. - DOI - PubMed
    1. Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970;48(3):443–453. doi: 10.1016/0022-2836(70)90057-4. - DOI - PubMed

Publication types