Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep 23:2020:8894478.
doi: 10.1155/2020/8894478. eCollection 2020.

Identifying Heat Shock Protein Families from Imbalanced Data by Using Combined Features

Affiliations

Identifying Heat Shock Protein Families from Imbalanced Data by Using Combined Features

Xiao-Yang Jing et al. Comput Math Methods Med. .

Abstract

Heat shock proteins (HSPs) are ubiquitous in living organisms. HSPs are an essential component for cell growth and survival; the main function of HSPs is controlling the folding and unfolding process of proteins. According to molecular function and mass, HSPs are categorized into six different families: HSP20 (small HSPS), HSP40 (J-proteins), HSP60, HSP70, HSP90, and HSP100. In this paper, improved methods for HSP prediction are proposed-the split amino acid composition (SAAC), the dipeptide composition (DC), the conjoint triad feature (CTF), and the pseudoaverage chemical shift (PseACS) were selected to predict the HSPs with a support vector machine (SVM). In order to overcome the imbalance data classification problems, the syntactic minority oversampling technique (SMOTE) was used to balance the dataset. The overall accuracy was 99.72% with a balanced dataset in the jackknife test by using the optimized combination feature SAAC+DC+CTF+PseACS, which was 4.81% higher than the imbalanced dataset with the same combination feature. The Sn, Sp, Acc, and MCC of HSP families in our predictive model were higher than those in existing methods. This improved method may be helpful for protein function prediction.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there is no conflict of interest.

Figures

Figure 1
Figure 1
The flowchart of the proposed method. SAAC: split amino acid composition; DC: dipeptide composition; CTF: conjoint triad feature; PseACS: pseudoaverage chemical shift; SMOTE: syntactic minority oversampling technique.
Figure 2
Figure 2
Prediction results of different combined features. Numbers denote features: 1 for DC, 2 for CTF, 3 for PseACS, and 4 for SAAC.
Figure 3
Figure 3
The predictive sensitivity, specificity, MCC, and accuracy of HSPs by using four algorithms.
Figure 4
Figure 4
The predictive overall accuracy of HSPs by using four algorithms.
Figure 5
Figure 5
A comparison of the proposed method for independent datasets.

References

    1. Liu T., Daniels C. K., Cao S. Comprehensive review on the HSC70 functions, interactions with related molecules and involvement in clinical diseases and therapeutic potential. Pharmacology & Therapeutics. 2012;136(3):354–374. doi: 10.1016/j.pharmthera.2012.08.014. - DOI - PubMed
    1. Wu J. M., Liu T. E., Rios Z., Mei Q. B., Lin X. K., Cao S. S. Heat shock proteins and cancer. Trends in Pharmacological Sciences. 2017;38(3):226–256. doi: 10.1016/j.tips.2016.11.009. - DOI - PubMed
    1. Feder M. E., Hofmann G. E. Heat-shock proteins, molecular chaperones, and the stress response: evolutionary and ecological physiology. Annual Review of Physiology. 1999;61(1):243–282. doi: 10.1146/annurev.physiol.61.1.243. - DOI - PubMed
    1. Qazi S. R., Ul Haq N., Ahmad S., Shakeel S. N. HSEAT: a tool for plant heat shock element analysis, motif identification and analysis. Current Bioinformatics. 2020;15(3):196–203. doi: 10.2174/1574893614666190102151956. - DOI
    1. Chatterjee S., Burns T. F. Targeting heat shock proteins in cancer: a promising therapeutic approach. International Journal of Molecular Sciences. 2017;18(9):p. 1978. doi: 10.3390/ijms18091978. - DOI - PMC - PubMed