. 2010 Jan 18;11 Suppl 1(Suppl 1):S9.

doi: 10.1186/1471-2105-11-S1-S9.

Prediction of protein structural classes for low-homology sequences based on predicted secondary structure

Jian-Yi Yang¹, Zhen-Ling Peng, Xin Chen

Affiliations

Affiliation

¹ Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, 21 Nanyang Link, Singapore. yang0241@ntu.edu.sg

PMID: 20122246
PMCID: PMC3009544
DOI: 10.1186/1471-2105-11-S1-S9

Prediction of protein structural classes for low-homology sequences based on predicted secondary structure

Jian-Yi Yang et al. BMC Bioinformatics. 2010.

. 2010 Jan 18;11 Suppl 1(Suppl 1):S9.

doi: 10.1186/1471-2105-11-S1-S9.

Authors

Jian-Yi Yang¹, Zhen-Ling Peng, Xin Chen

Affiliation

¹ Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, 21 Nanyang Link, Singapore. yang0241@ntu.edu.sg

PMID: 20122246
PMCID: PMC3009544
DOI: 10.1186/1471-2105-11-S1-S9

Abstract

Background: Prediction of protein structural classes (alpha, beta, alpha + beta and alpha/beta) from amino acid sequences is of great importance, as it is beneficial to study protein function, regulation and interactions. Many methods have been developed for high-homology protein sequences, and the prediction accuracies can achieve up to 90%. However, for low-homology sequences whose average pairwise sequence identity lies between 20% and 40%, they perform relatively poorly, yielding the prediction accuracy often below 60%.

Results: We propose a new method to predict protein structural classes on the basis of features extracted from the predicted secondary structures of proteins rather than directly from their amino acid sequences. It first uses PSIPRED to predict the secondary structure for each protein sequence. Then, the chaos game representation is employed to represent the predicted secondary structure as two time series, from which we generate a comprehensive set of 24 features using recurrence quantification analysis, K-string based information entropy and segment-based analysis. The resulting feature vectors are finally fed into a simple yet powerful Fisher's discriminant algorithm for the prediction of protein structural classes. We tested the proposed method on three benchmark datasets in low homology and achieved the overall prediction accuracies of 82.9%, 83.1% and 81.3%, respectively. Comparisons with ten existing methods showed that our method consistently performs better for all the tested datasets and the overall accuracy improvements range from 2.3% to 27.5%. A web server that implements the proposed method is freely available at http://www1.spms.ntu.edu.sg/~chenxin/RKS_PPSC/.

Conclusion: The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the predicted secondary structure sequences, which is capable of characterizing the sequence order information, local interactions of the secondary structural elements, and spacial arrangements of alpha helices and beta strands. Thus, it is a valuable method to predict protein structural classes particularly for low-homology amino acid sequences.

PubMed Disclaimer

Figures

**Figure 1**
**The CGRs of predicted secondary structure for proteins from four structural classes**. The blue edges represents the sides of equilateral triangles and the black points represent the CGR points. The order of the black points (corresponding to the order in the predicted secondary structure) is saved, but not shown in the figure. The PDB IDs for four proteins are 1A6M (belonging to the α class), 1AJW (belonging to the β class), 1GQOV (belonging to the α/β class), and 1DEF (belonging to the α + β class).

**Figure 2**
**Eight time series that represent the four CGRs in Figure 1**. Each panel in Figure 1 gives rise to two time series (x- and y-coordinates, respectively). As a result, we obtain eight time series for four CGRs.

**Figure 3**
**The corresponding RPs for the four x-time series in Figure 2**. The parameters used are m = 3, τ = 1, and ε = 20%. Note that there is a black line along the main diagonal in the plots since a point always recurs with itself. Moreover, the points in the RP are symmetric with respect to the main diagonal line.

**Figure 4**
**The corresponding RPs for the four y-time series in Figure 2**. The parameters used are m = 3, τ = 1, and ε = 20%. Some interesting patterns can be seen to emerge from the plots, but it is not so easy to characterize them. In this study we chose the recurrence quantification analysis (RQA).

**Figure 5**
**The overall prediction accuracies of the dataset 25PDB with varying values of ε and K**. When K = 2, ε ranges from 1% to 50% (left panel). When ε = 39%, K ranges between 2 and 15 (right panel).

See this image and copyright information in PMC

Cited by

Customised fragments libraries for protein structure prediction based on structural class annotations.
Abbass J, Nebel JC. Abbass J, et al. BMC Bioinformatics. 2015 Apr 29;16(1):136. doi: 10.1186/s12859-015-0576-2. BMC Bioinformatics. 2015. PMID: 25925397 Free PMC article.
Using Recursive Feature Selection with Random Forest to Improve Protein Structural Class Prediction for Low-Similarity Sequences.
Wang Y, Xu Y, Yang Z, Liu X, Dai Q. Wang Y, et al. Comput Math Methods Med. 2021 May 7;2021:5529389. doi: 10.1155/2021/5529389. eCollection 2021. Comput Math Methods Med. 2021. PMID: 34055035 Free PMC article.
The HER2 target for designing novel multi-peptide vaccine against breast cancer using immunoinformatics and molecular dynamic simulation.
Firuzpour F, Barancheshmeh M, Ziarani FF, Karami L, Aram C. Firuzpour F, et al. Biochem Biophys Rep. 2025 Jul 4;43:102135. doi: 10.1016/j.bbrep.2025.102135. eCollection 2025 Sep. Biochem Biophys Rep. 2025. PMID: 40688509 Free PMC article.
Accurate prediction of protein structural class.
Xia XY, Ge M, Wang ZX, Pan XM. Xia XY, et al. PLoS One. 2012;7(6):e37653. doi: 10.1371/journal.pone.0037653. Epub 2012 Jun 19. PLoS One. 2012. PMID: 22723837 Free PMC article.
Comparison study on statistical features of predicted secondary structures for protein structural class prediction: From content to position.
Dai Q, Li Y, Liu X, Yao Y, Cao Y, He P. Dai Q, et al. BMC Bioinformatics. 2013 May 4;14:152. doi: 10.1186/1471-2105-14-152. BMC Bioinformatics. 2013. PMID: 23641706 Free PMC article.

See all "Cited by" articles

References

1. Anfinsen C. Principles that govern the folding of protein chains. Science. 1973;181:223–230. doi: 10.1126/science.181.4096.223. - DOI - PubMed
1. Levitt M, Chothia C. Structural patterns in globular proteins. Nature. 1976;261:552–558. doi: 10.1038/261552a0. - DOI - PubMed
1. Bahar I, Atilgan AR, Jernigan RL, Erman B. Understanding the recognition of protein structural classes by amino acid composition. Proteins. 1997;29:172–185. doi: 10.1002/(SICI)1097-0134(199710)29:2<172::AID-PROT5>3.0.CO;2-F. - DOI - PubMed
1. Chou KC, Zhang CT. Predicting of protein structural class. Crit Rev Biochem Mol Biol. 1995;30:275–349. doi: 10.3109/10409239509083488. - DOI - PubMed
1. Kedarisetti KD, Kurgan LA, Dick S. Classifier ensembles for protein structural class prediction with varying homology. Biochem Biophys Res Commun. 2006;348:981–988. doi: 10.1016/j.bbrc.2006.07.141. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Prediction of protein structural classes for low-homology sequences based on predicted secondary structure

Affiliation

Prediction of protein structural classes for low-homology sequences based on predicted secondary structure

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources