Prediction of Protein Structural Class Based on Gapped-Dipeptides and a Recursive Feature Selection Approach
- PMID: 26712737
- PMCID: PMC4730262
- DOI: 10.3390/ijms17010015
Prediction of Protein Structural Class Based on Gapped-Dipeptides and a Recursive Feature Selection Approach
Abstract
The prior knowledge of protein structural class may offer useful clues on understanding its functionality as well as its tertiary structure. Though various significant efforts have been made to find a fast and effective computational approach to address this problem, it is still a challenging topic in the field of bioinformatics. The position-specific score matrix (PSSM) profile has been shown to provide a useful source of information for improving the prediction performance of protein structural class. However, this information has not been adequately explored. To this end, in this study, we present a feature extraction technique which is based on gapped-dipeptides composition computed directly from PSSM. Then, a careful feature selection technique is performed based on support vector machine-recursive feature elimination (SVM-RFE). These optimal features are selected to construct a final predictor. The results of jackknife tests on four working datasets show that our method obtains satisfactory prediction accuracies by extracting features solely based on PSSM and could serve as a very promising tool to predict protein structural class.
Keywords: feature selection; gapped-dipeptide; position-specific score matrix; protein structural class; recursive feature elimination; support vector machine.
Figures
Similar articles
-
Prediction of protein structural class using tri-gram probabilities of position-specific scoring matrix and recursive feature elimination.Amino Acids. 2015 Mar;47(3):461-8. doi: 10.1007/s00726-014-1878-9. Epub 2015 Jan 13. Amino Acids. 2015. PMID: 25583603
-
A highly accurate protein structural class prediction approach using auto cross covariance transformation and recursive feature elimination.Comput Biol Chem. 2015 Dec;59 Pt A:95-100. doi: 10.1016/j.compbiolchem.2015.08.012. Epub 2015 Sep 2. Comput Biol Chem. 2015. PMID: 26460680
-
PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations.PLoS One. 2014 Mar 27;9(3):e92863. doi: 10.1371/journal.pone.0092863. eCollection 2014. PLoS One. 2014. PMID: 24675610 Free PMC article.
-
Prediction of oxidoreductase subfamily classes based on RFE-SND-CC-PSSM and machine learning methods.J Bioinform Comput Biol. 2019 Aug;17(4):1950029. doi: 10.1142/S021972001950029X. J Bioinform Comput Biol. 2019. PMID: 31617464
-
Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing.BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S13. doi: 10.1186/1471-2105-13-S17-S13. Epub 2012 Dec 13. BMC Bioinformatics. 2012. PMID: 23282098 Free PMC article.
Cited by
-
PCVMZM: Using the Probabilistic Classification Vector Machines Model Combined with a Zernike Moments Descriptor to Predict Protein-Protein Interactions from Protein Sequences.Int J Mol Sci. 2017 May 11;18(5):1029. doi: 10.3390/ijms18051029. Int J Mol Sci. 2017. PMID: 28492483 Free PMC article.
-
iT3SE-PX: Identification of Bacterial Type III Secreted Effectors Using PSSM Profiles and XGBoost Feature Selection.Comput Math Methods Med. 2021 Jan 6;2021:6690299. doi: 10.1155/2021/6690299. eCollection 2021. Comput Math Methods Med. 2021. PMID: 33505516 Free PMC article.
-
iAPSL-IF: Identification of Apoptosis Protein Subcellular Location Using Integrative Features Captured from Amino Acid Sequences.Int J Mol Sci. 2018 Apr 13;19(4):1190. doi: 10.3390/ijms19041190. Int J Mol Sci. 2018. PMID: 29652843 Free PMC article.
-
Prediction of protein structural classes by different feature expressions based on 2-D wavelet denoising and fusion.BMC Bioinformatics. 2019 Dec 24;20(Suppl 25):701. doi: 10.1186/s12859-019-3276-5. BMC Bioinformatics. 2019. PMID: 31874617 Free PMC article.
-
RVMAB: Using the Relevance Vector Machine Model Combined with Average Blocks to Predict the Interactions of Proteins from Protein Sequences.Int J Mol Sci. 2016 May 18;17(5):757. doi: 10.3390/ijms17050757. Int J Mol Sci. 2016. PMID: 27213337 Free PMC article.
References
-
- Singh R., Brewer M.K., Mashburn C.B., Lou D., Bondada V., Graham B., Geddes J.W. Calpain 5 is highly expressed in the central nervous system (CNS), carries dual nuclear localization signals, and is associated with nuclear promyelocytic leukemia protein bodies. J. Biol. Chem. 2014;289:19383–19394. doi: 10.1074/jbc.M114.575159. - DOI - PMC - PubMed
-
- Leung K.H., He B., Yang C., Leung C.H., Wang H.M., Ma D.L. Development of an aptamer-based sensing platform for metal ions, proteins, and small molecules through terminal deoxynucleotidyl transferase induced G-quadruplex formation. ACS Appl. Mater. Interfaces. 2015;7:24046–24052. doi: 10.1021/acsami.5b08314. - DOI - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources