Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Dec 24;17(1):15.
doi: 10.3390/ijms17010015.

Prediction of Protein Structural Class Based on Gapped-Dipeptides and a Recursive Feature Selection Approach

Affiliations

Prediction of Protein Structural Class Based on Gapped-Dipeptides and a Recursive Feature Selection Approach

Taigang Liu et al. Int J Mol Sci. .

Abstract

The prior knowledge of protein structural class may offer useful clues on understanding its functionality as well as its tertiary structure. Though various significant efforts have been made to find a fast and effective computational approach to address this problem, it is still a challenging topic in the field of bioinformatics. The position-specific score matrix (PSSM) profile has been shown to provide a useful source of information for improving the prediction performance of protein structural class. However, this information has not been adequately explored. To this end, in this study, we present a feature extraction technique which is based on gapped-dipeptides composition computed directly from PSSM. Then, a careful feature selection technique is performed based on support vector machine-recursive feature elimination (SVM-RFE). These optimal features are selected to construct a final predictor. The results of jackknife tests on four working datasets show that our method obtains satisfactory prediction accuracies by extracting features solely based on PSSM and could serve as a very promising tool to predict protein structural class.

Keywords: feature selection; gapped-dipeptide; position-specific score matrix; protein structural class; recursive feature elimination; support vector machine.

PubMed Disclaimer

Figures

Figure 1
Figure 1
This graph shows how different top K features affect the overall accuracies.

Similar articles

Cited by

References

    1. Leung C.H., Chan D.S., He H.Z., Cheng Z., Yang H., Ma D.L. Luminescent detection of DNA-binding proteins. Nucleic Acids Res. 2012;40:941–955. doi: 10.1093/nar/gkr763. - DOI - PMC - PubMed
    1. Singh R., Brewer M.K., Mashburn C.B., Lou D., Bondada V., Graham B., Geddes J.W. Calpain 5 is highly expressed in the central nervous system (CNS), carries dual nuclear localization signals, and is associated with nuclear promyelocytic leukemia protein bodies. J. Biol. Chem. 2014;289:19383–19394. doi: 10.1074/jbc.M114.575159. - DOI - PMC - PubMed
    1. Leung K.H., He B., Yang C., Leung C.H., Wang H.M., Ma D.L. Development of an aptamer-based sensing platform for metal ions, proteins, and small molecules through terminal deoxynucleotidyl transferase induced G-quadruplex formation. ACS Appl. Mater. Interfaces. 2015;7:24046–24052. doi: 10.1021/acsami.5b08314. - DOI - PubMed
    1. Lin S., Gao W., Tian Z., Yang C., Lu L., Mergny J.-L., Leung C.-H., Ma D.-L. Luminescence switch-on detection of protein tyrosine kinase-7 using a G-quadruplex-selective probe. Chem. Sci. 2015;6:4284–4290. doi: 10.1039/C5SC01320H. - DOI - PMC - PubMed
    1. Levitt M., Chothia C. Structural patterns in globular proteins. Nature. 1976;261:552–558. doi: 10.1038/261552a0. - DOI - PubMed

Publication types

LinkOut - more resources