Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 5;22(6):bbab278.
doi: 10.1093/bib/bbab278.

PScL-HDeep: image-based prediction of protein subcellular location in human tissue using ensemble learning of handcrafted and deep learned features with two-layer feature selection

Affiliations

PScL-HDeep: image-based prediction of protein subcellular location in human tissue using ensemble learning of handcrafted and deep learned features with two-layer feature selection

Matee Ullah et al. Brief Bioinform. .

Abstract

Protein subcellular localization plays a crucial role in characterizing the function of proteins and understanding various cellular processes. Therefore, accurate identification of protein subcellular location is an important yet challenging task. Numerous computational methods have been proposed to predict the subcellular location of proteins. However, most existing methods have limited capability in terms of the overall accuracy, time consumption and generalization power. To address these problems, in this study, we developed a novel computational approach based on human protein atlas (HPA) data, referred to as PScL-HDeep, for accurate and efficient image-based prediction of protein subcellular location in human tissues. We extracted different handcrafted and deep learned (by employing pretrained deep learning model) features from different viewpoints of the image. The step-wise discriminant analysis (SDA) algorithm was applied to generate the optimal feature set from each original raw feature set. To further obtain a more informative feature subset, support vector machine-based recursive feature elimination with correlation bias reduction (SVM-RFE + CBR) feature selection algorithm was applied to the integrated feature set. Finally, the classification models, namely support vector machine with radial basis function (SVM-RBF) and support vector machine with linear kernel (SVM-LNR), were learned on the final selected feature set. To evaluate the performance of the proposed method, a new gold standard benchmark training dataset was constructed from the HPA databank. PScL-HDeep achieved the maximum performance on 10-fold cross validation test on this dataset and showed a better efficacy over existing predictors. Furthermore, we also illustrated the generalization ability of the proposed method by conducting a stringent independent validation test.

Keywords: bioimage analysis; deep learned features; feature selection; handcrafted features; protein subcellular location.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Deep learned feature extraction strategy using the VGG-19 deep learning architecture.
Figure 2
Figure 2
Schematic workflow of the developed PScL-HDeep.
Figure 3
Figure 3
Performance comparison of eight types of pure individual features on three different classifiers.
Figure 4
Figure 4
Performance comparison of individual features before and after the SDA feature selection.
Figure 5
Figure 5
Performance comparison between the Har pure feature set and the Har optimal feature set under the SVM-RBF model: (A) ROC curves of the Har pure feature set; (B) ROC curves after applying the SDA feature selection method; (C) Distribution of AUC values of the Har pure feature set and (D) Distribution of AUC values after applying the SDA feature selection technique.
Figure 6
Figure 6
Variation curves of F1-ScoreM and MCC values against different number of selected features based on the ranked features.
Figure 7
Figure 7
ROC curves and AUC distribution of the Sup-400 feature set: (A) shows the ROC curves under the SVM-LNR classifier; (B) shows the ROC curves under the SVM-RBF classifier; (C) shows the AUC distribution under the SVM-LNR classifier and (D) shows the AUC distribution under the SVM-RBF classifier.
Figure 8
Figure 8
Performance comparison of the proposed PScL-HDeep with other existing methods on the independent test dataset.

Similar articles

Cited by

References

    1. Yang F, Xu Y-Y, Wang S-T, et al. . Image-based classification of protein subcellular location patterns in human reproductive tissue by ensemble learning global and local features. Neurocomputing 2014;131:113–23.
    1. Chebira A, Barbotin Y, Jackson C, et al. . A multiresolution approach to automated classification of protein subcellular location images. BMC Bioinformatics 2007;8:210. - PMC - PubMed
    1. Hung M-C, Link W. Protein localization in disease and therapy. J Cell Sci 2011;124:3381–92. - PubMed
    1. Kajiwara D, Minamiguchi K, Seki M, et al. . Effect of a new type androgen receptor antagonist, TAS3681, on ligand-independent AR activation through its AR downregulation activity. J Clin Oncol 2016;34:199–9. - PubMed
    1. Thul PJ, Akesson L, Wiking M, et al. . A subcellular map of the human proteome. Science 2017;356:eaal3321. - PubMed

Publication types