Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan 18;11 Suppl 1(Suppl 1):S9.
doi: 10.1186/1471-2105-11-S1-S9.

Prediction of protein structural classes for low-homology sequences based on predicted secondary structure

Affiliations

Prediction of protein structural classes for low-homology sequences based on predicted secondary structure

Jian-Yi Yang et al. BMC Bioinformatics. .

Abstract

Background: Prediction of protein structural classes (alpha, beta, alpha + beta and alpha/beta) from amino acid sequences is of great importance, as it is beneficial to study protein function, regulation and interactions. Many methods have been developed for high-homology protein sequences, and the prediction accuracies can achieve up to 90%. However, for low-homology sequences whose average pairwise sequence identity lies between 20% and 40%, they perform relatively poorly, yielding the prediction accuracy often below 60%.

Results: We propose a new method to predict protein structural classes on the basis of features extracted from the predicted secondary structures of proteins rather than directly from their amino acid sequences. It first uses PSIPRED to predict the secondary structure for each protein sequence. Then, the chaos game representation is employed to represent the predicted secondary structure as two time series, from which we generate a comprehensive set of 24 features using recurrence quantification analysis, K-string based information entropy and segment-based analysis. The resulting feature vectors are finally fed into a simple yet powerful Fisher's discriminant algorithm for the prediction of protein structural classes. We tested the proposed method on three benchmark datasets in low homology and achieved the overall prediction accuracies of 82.9%, 83.1% and 81.3%, respectively. Comparisons with ten existing methods showed that our method consistently performs better for all the tested datasets and the overall accuracy improvements range from 2.3% to 27.5%. A web server that implements the proposed method is freely available at http://www1.spms.ntu.edu.sg/~chenxin/RKS_PPSC/.

Conclusion: The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the predicted secondary structure sequences, which is capable of characterizing the sequence order information, local interactions of the secondary structural elements, and spacial arrangements of alpha helices and beta strands. Thus, it is a valuable method to predict protein structural classes particularly for low-homology amino acid sequences.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The CGRs of predicted secondary structure for proteins from four structural classes. The blue edges represents the sides of equilateral triangles and the black points represent the CGR points. The order of the black points (corresponding to the order in the predicted secondary structure) is saved, but not shown in the figure. The PDB IDs for four proteins are 1A6M (belonging to the α class), 1AJW (belonging to the β class), 1GQOV (belonging to the α/β class), and 1DEF (belonging to the α + β class).
Figure 2
Figure 2
Eight time series that represent the four CGRs in Figure 1. Each panel in Figure 1 gives rise to two time series (x- and y-coordinates, respectively). As a result, we obtain eight time series for four CGRs.
Figure 3
Figure 3
The corresponding RPs for the four x-time series in Figure 2. The parameters used are m = 3, τ = 1, and ε = 20%. Note that there is a black line along the main diagonal in the plots since a point always recurs with itself. Moreover, the points in the RP are symmetric with respect to the main diagonal line.
Figure 4
Figure 4
The corresponding RPs for the four y-time series in Figure 2. The parameters used are m = 3, τ = 1, and ε = 20%. Some interesting patterns can be seen to emerge from the plots, but it is not so easy to characterize them. In this study we chose the recurrence quantification analysis (RQA).
Figure 5
Figure 5
The overall prediction accuracies of the dataset 25PDB with varying values of ε and K. When K = 2, ε ranges from 1% to 50% (left panel). When ε = 39%, K ranges between 2 and 15 (right panel).

Similar articles

Cited by

References

    1. Anfinsen C. Principles that govern the folding of protein chains. Science. 1973;181:223–230. doi: 10.1126/science.181.4096.223. - DOI - PubMed
    1. Levitt M, Chothia C. Structural patterns in globular proteins. Nature. 1976;261:552–558. doi: 10.1038/261552a0. - DOI - PubMed
    1. Bahar I, Atilgan AR, Jernigan RL, Erman B. Understanding the recognition of protein structural classes by amino acid composition. Proteins. 1997;29:172–185. doi: 10.1002/(SICI)1097-0134(199710)29:2<172::AID-PROT5>3.0.CO;2-F. - DOI - PubMed
    1. Chou KC, Zhang CT. Predicting of protein structural class. Crit Rev Biochem Mol Biol. 1995;30:275–349. doi: 10.3109/10409239509083488. - DOI - PubMed
    1. Kedarisetti KD, Kurgan LA, Dick S. Classifier ensembles for protein structural class prediction with varying homology. Biochem Biophys Res Commun. 2006;348:981–988. doi: 10.1016/j.bbrc.2006.07.141. - DOI - PubMed

Publication types

LinkOut - more resources