Multifaceted analysis of training and testing convolutional neural networks for protein secondary structure prediction
- PMID: 32374785
- PMCID: PMC7202669
- DOI: 10.1371/journal.pone.0232528
Multifaceted analysis of training and testing convolutional neural networks for protein secondary structure prediction
Abstract
Protein secondary structure prediction remains a vital topic with broad applications. Due to lack of a widely accepted standard in secondary structure predictor evaluation, a fair comparison of predictors is challenging. A detailed examination of factors that contribute to higher accuracy is also lacking. In this paper, we present: (1) new test sets, Test2018, Test2019, and Test2018-2019, consisting of proteins from structures released in 2018 and 2019 with less than 25% identity to any protein published before 2018; (2) a 4-layer convolutional neural network, SecNet, with an input window of ±14 amino acids which was trained on proteins ≤25% identical to proteins in Test2018 and the commonly used CB513 test set; (3) an additional test set that shares no homologous domains with the training set proteins, according to the Evolutionary Classification of Proteins (ECOD) database; (4) a detailed ablation study where we reverse one algorithmic choice at a time in SecNet and evaluate the effect on the prediction accuracy; (5) new 4- and 5-label prediction alphabets that may be more practical for tertiary structure prediction methods. The 3-label accuracy (helix, sheet, coil) of the leading predictors on both Test2018 and CB513 is 81-82%, while SecNet's accuracy is 84% for both sets. Accuracy on the non-homologous ECOD set is only 0.6 points (83.9%) lower than the results on the Test2018-2019 set (84.5%). The ablation study of features, neural network architecture, and training hyper-parameters suggests the best accuracy results are achieved with good choices for each of them while the neural network architecture is not as critical as long as it is not too simple. Protocols for generating and using unbiased test, validation, and training sets are provided. Our data sets, including input features and assigned labels, and SecNet software including third-party dependencies and databases, are downloadable from dunbrack.fccc.edu/ss and github.com/sh-maxim/ss.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures




Similar articles
-
MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction.Proteins. 2018 May;86(5):592-598. doi: 10.1002/prot.25487. Epub 2018 Mar 12. Proteins. 2018. PMID: 29492997 Free PMC article.
-
MABAL: a Novel Deep-Learning Architecture for Machine-Assisted Bone Age Labeling.J Digit Imaging. 2018 Aug;31(4):513-519. doi: 10.1007/s10278-018-0053-3. J Digit Imaging. 2018. PMID: 29404850 Free PMC article.
-
ILMCNet: A Deep Neural Network Model That Uses PLM to Process Features and Employs CRF to Predict Protein Secondary Structure.Genes (Basel). 2024 Oct 21;15(10):1350. doi: 10.3390/genes15101350. Genes (Basel). 2024. PMID: 39457474 Free PMC article.
-
Recent Advances in Computational Prediction of Secondary and Supersecondary Structures from Protein Sequences.Methods Mol Biol. 2025;2870:1-19. doi: 10.1007/978-1-0716-4213-9_1. Methods Mol Biol. 2025. PMID: 39543027 Review.
-
The trRosetta server for fast and accurate protein structure prediction.Nat Protoc. 2021 Dec;16(12):5634-5651. doi: 10.1038/s41596-021-00628-9. Epub 2021 Nov 10. Nat Protoc. 2021. PMID: 34759384 Review.
Cited by
-
Ensemble of Template-Free and Template-Based Classifiers for Protein Secondary Structure Prediction.Int J Mol Sci. 2021 Oct 23;22(21):11449. doi: 10.3390/ijms222111449. Int J Mol Sci. 2021. PMID: 34768880 Free PMC article.
-
The whole is greater than its parts: ensembling improves protein contact prediction.Sci Rep. 2021 Apr 13;11(1):8039. doi: 10.1038/s41598-021-87524-0. Sci Rep. 2021. PMID: 33850214 Free PMC article.
-
Deep geometric representations for modeling effects of mutations on protein-protein binding affinity.PLoS Comput Biol. 2021 Aug 4;17(8):e1009284. doi: 10.1371/journal.pcbi.1009284. eCollection 2021 Aug. PLoS Comput Biol. 2021. PMID: 34347784 Free PMC article.
-
Deep learning for protein secondary structure prediction: Pre and post-AlphaFold.Comput Struct Biotechnol J. 2022 Nov 11;20:6271-6286. doi: 10.1016/j.csbj.2022.11.012. eCollection 2022. Comput Struct Biotechnol J. 2022. PMID: 36420164 Free PMC article. Review.
-
Post-processing enhances protein secondary structure prediction with second order deep learning and embeddings.Comput Struct Biotechnol J. 2025 Jan 2;27:243-251. doi: 10.1016/j.csbj.2024.12.022. eCollection 2025. Comput Struct Biotechnol J. 2025. PMID: 39866664 Free PMC article.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources