Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 May 1;19(3):482-494.
doi: 10.1093/bib/bbw129.

Sixty-five years of the long march in protein secondary structure prediction: the final stretch?

Affiliations

Sixty-five years of the long march in protein secondary structure prediction: the final stretch?

Yuedong Yang et al. Brief Bioinform. .

Abstract

Protein secondary structure prediction began in 1951 when Pauling and Corey predicted helical and sheet conformations for protein polypeptide backbone even before the first protein structure was determined. Sixty-five years later, powerful new methods breathe new life into this field. The highest three-state accuracy without relying on structure templates is now at 82-84%, a number unthinkable just a few years ago. These improvements came from increasingly larger databases of protein sequences and structures for training, the use of template secondary structure information and more powerful deep learning techniques. As we are approaching to the theoretical limit of three-state prediction (88-90%), alternative to secondary structure prediction (prediction of backbone torsion angles and Cα-atom-based angles and torsion angles) not only has more room for further improvement but also allows direct prediction of three-dimensional fragment structures with constantly improved accuracy. About 20% of all 40-residue fragments in a database of 1199 non-redundant proteins have <6 Å root-mean-squared distance from the native conformations by SPIDER2. More powerful deep learning methods with improved capability of capturing long-range interactions begin to emerge as the next generation of techniques for secondary structure prediction. The time has come to finish off the final stretch of the long march towards protein secondary structure prediction.

PubMed Disclaimer

Figures

Figure 1
Figure 1
New methods continue in development for secondary structure prediction. The number of publications on protein secondary structure prediction per year and its cumulative increment.
Figure 2
Figure 2
Conservation of secondary structure in homologous sequences. The average consistency on secondary structure of homologous sequences at a given sequence identity between two sequences compared or over all compared sequences above a given sequence identity (cumulative from high sequence identity).
Figure 3
Figure 3
The dependence of accuracies and misclassifications on solvent accessibility. The accuracy of predicting helices (QH), sheets (QE) and coils (QC) and the overall accuracy (Q3) (A) and the misclassifications of helices to coils, sheets, sheets to coils and helices and coils to helices and sheets (B) as a function of solvent accessibility for TS115 by SPIDER2.
Figure 4
Figure 4
The dependence of accuracy on non-local contacts. The secondary structure accuracy as a function of the number of non-local contacts (|i − j| > 19) for the independent test set (TS1199) by SPINE X and SPIDER2.
Figure 5
Figure 5
Direct prediction of three-dimensional structure by predicted angles. Structure (dark colour) constructed directly from φ/ψ angles compared with native structure (light colour) for residues 24–63 from PDB 5fdy chain A.

References

    1. Botstein D, Ashburner M, Ball CA, et al. Gene ontology: tool for the unification of biology, Nat Genet 2000;25:25–9. - PMC - PubMed
    1. Andreeva A, Howorth D, Chandonia JM, et al. Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 2008;36:D419–25. - PMC - PubMed
    1. Sillitoe I, Lewis TE, Cuff A, et al. CATH: comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res 2015;43:D376–81. - PMC - PubMed
    1. Benson DA, Clark K, Karsch-Mizrachi I, et al. GenBank. Nucleic Acids Res 2015;43:D30–5. - PMC - PubMed
    1. Berman HM, Westbrook J, Feng Z, et al. The Protein Data Bank. Nucleic Acids Res 2000;28:235–42. - PMC - PubMed

Publication types