Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 30;11(4):293.
doi: 10.3390/life11040293.

Ensemble of Multiple Classifiers for Multilabel Classification of Plant Protein Subcellular Localization

Affiliations

Ensemble of Multiple Classifiers for Multilabel Classification of Plant Protein Subcellular Localization

Warin Wattanapornprom et al. Life (Basel). .

Abstract

The accurate prediction of protein localization is a critical step in any functional genome annotation process. This paper proposes an improved strategy for protein subcellular localization prediction in plants based on multiple classifiers, to improve prediction results in terms of both accuracy and reliability. The prediction of plant protein subcellular localization is challenging because the underlying problem is not only a multiclass, but also a multilabel problem. Generally, plant proteins can be found in 10-14 locations/compartments. The number of proteins in some compartments (nucleus, cytoplasm, and mitochondria) is generally much greater than that in other compartments (vacuole, peroxisome, Golgi, and cell wall). Therefore, the problem of imbalanced data usually arises. Therefore, we propose an ensemble machine learning method based on average voting among heterogeneous classifiers. We first extracted various types of features suitable for each type of protein localization to form a total of 479 feature spaces. Then, feature selection methods were used to reduce the dimensions of the features into smaller informative feature subsets. This reduced feature subset was then used to train/build three different individual models. In the process of combining the three distinct classifier models, we used an average voting approach to combine the results of these three different classifiers that we constructed to return the final probability prediction. The method could predict subcellular localizations in both single- and multilabel locations, based on the voting probability. Experimental results indicated that the proposed ensemble method could achieve correct classification with an overall accuracy of 84.58% for 11 compartments, on the basis of the testing dataset.

Keywords: average voting; consensus voting; ensemble machine learning; feature extraction; feature selection; go term; plant protein; subcellular localization prediction.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Workflow of the program.
Figure 2
Figure 2
Top 20 features that are highly correlated with each localization target.

Similar articles

Cited by

References

    1. Casadio R., Martelli P.L., Pierleoni A. The prediction of protein subcellular localization from sequence: A shortcut to functional genome annotation. Brief. Funct. Genom. Proteom. 2008;7:63–73. doi: 10.1093/bfgp/eln003. - DOI - PubMed
    1. Tung C., Chen C., Sun H., Chu Y. Predicting human protein subcellular localization by heterogeneous and comprehensive approaches. PLoS ONE. 2017;12:e0178832. doi: 10.1371/journal.pone.0178832. - DOI - PMC - PubMed
    1. Kumar R., Dhanda S.K. Bird Eye View of Protein Subcellular Localization Prediction. Life. 2020;10:347. doi: 10.3390/life10120347. - DOI - PMC - PubMed
    1. Kumar A., Ahmad A., Vyawahare A., Khan R. Membrane Trafficking and Subcellular Drug Targeting Pathways. Front. Pharm. 2020;11:629. doi: 10.3389/fphar.2020.00629. - DOI - PMC - PubMed
    1. Rajendran L., Knölker H., Simons K. Subcellular targeting strategies for drug design and delivery. Nat. Rev. Drug Discov. 2010;9:29–42. doi: 10.1038/nrd2897. - DOI - PubMed

LinkOut - more resources