Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2013 May 4:14:152.
doi: 10.1186/1471-2105-14-152.

Comparison study on statistical features of predicted secondary structures for protein structural class prediction: From content to position

Affiliations
Comparative Study

Comparison study on statistical features of predicted secondary structures for protein structural class prediction: From content to position

Qi Dai et al. BMC Bioinformatics. .

Abstract

Background: Many content-based statistical features of secondary structural elements (CBF-PSSEs) have been proposed and achieved promising results in protein structural class prediction, but until now position distribution of the successive occurrences of an element in predicted secondary structure sequences hasn't been used. It is necessary to extract some appropriate position-based features of the secondary structural elements for prediction task.

Results: We proposed some position-based features of predicted secondary structural elements (PBF-PSSEs) and assessed their intrinsic ability relative to the available CBF-PSSEs, which not only offers a systematic and quantitative experimental assessment of these statistical features, but also naturally complements the available comparison of the CBF-PSSEs. We also analyzed the performance of the CBF-PSSEs combined with the PBF-PSSE and further constructed a new combined feature set, PBF11CBF-PSSE. Based on these experiments, novel valuable guidelines for the use of PBF-PSSEs and CBF-PSSEs were obtained.

Conclusions: PBF-PSSEs and CBF-PSSEs have a compelling impact on protein structural class prediction. When combining with the PBF-PSSE, most of the CBF-PSSEs get a great improvement over the prediction accuracies, so the PBF-PSSEs and the CBF-PSSEs have to work closely so as to make significant and complementary contributions to protein structural class prediction. Besides, the proposed PBF-PSSE's performance is extremely sensitive to the choice of parameter k. In summary, our quantitative analysis verifies that exploring the position information of predicted secondary structural elements is a promising way to improve the abilities of protein structural class prediction.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distribution of the interval distance Dis(H) for the 25PDB dataset. Distribution (Dis(H)) of the interval distance between two nearest structure elements H for the 25PDB dataset, a, b, a+b and a/b denotes all-α, all-β, α+β and α / β classes.
Figure 2
Figure 2
Distribution of the interval distance Dis(H) (Dis(H) >1) for the 25PDB dataset. Distribution (Dis(H)) of the interval distance between two nearest structure elements H for the 25PDB dataset. Here, Dis(H) >1, and a, b, a+b and a/b denotes all-α, all-β, α + β and α / β classes.
Figure 3
Figure 3
Performance of the CBF-PSSEs combined with the PBF-PSSE CF(δ) and the CBF-PSSE contentSE. Performance of the CBF-PSSEs combined with the PBF-PSSE CF(δ) and the CBF-PSSE contentSE, where C, MS, NMS, AS, NAS and 3P denote the contentSE , MaxSegSE , NMaxSegSE, AvgSegSE , NAvgSegSE and 3PATTERN.
Figure 4
Figure 4
The cumulative content of the interval distance for the datasets 25PDB, 640, FC699 and 1189. The cumulative content of the interval distance for the datasets 25PDB, 640, FC699 and 1189. Here, we calculate the cumulative content of Dis(C), Dis(E) and Dis(H), and the interval distance is added up to k=5, 10, 15, 20, 25 and 30
Figure 5
Figure 5
Comparison of the position-based features CF(δ) and C5(δ) for the datasets 25PDB, 640, FC699 and 1189. Comparison between CF(δ)and C5(δ) for the datasets 25PDB, 640, FC699 and 1189, where CF and C5 denote the PBF-PSSEs CF(δ) and C5(δ).

Similar articles

Cited by

References

    1. Chou KC. Structural bioinformatics and its impact to biomedical science and drug discovery. Front Med Chem. 2006;3:455–502. - PubMed
    1. Levitt M, Chothia C. Structural patterns in globular proteins. Nature. 1976;261:552–558. doi: 10.1038/261552a0. - DOI - PubMed
    1. Andreeva A, Howorth D, Brenner SE, Hubbard TJ, Chothia C, Murzin AG. SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res. 2004;32:D226–229. doi: 10.1093/nar/gkh039. - DOI - PMC - PubMed
    1. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: A structural classification of protein database for the investigation of sequence and structures. J Mol Biol. 1995;247:536–540. - PubMed
    1. Cuff AL, Sillitoe I, Lewis T, Redfern OC, Garratt R, Thornton J, Orengo CA. The CATH classification revisited–architectures reviewed and new ways to characterize structural divergence in superfamilies. Nucleic Acids Res. 2009;37:D310–D314. doi: 10.1093/nar/gkn877. - DOI - PMC - PubMed

Publication types

LinkOut - more resources