Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Oct 29:8:420.
doi: 10.1186/1471-2105-8-420.

'Unite and conquer': enhanced prediction of protein subcellular localization by integrating multiple specialized tools

Affiliations

'Unite and conquer': enhanced prediction of protein subcellular localization by integrating multiple specialized tools

Yao Qing Shen et al. BMC Bioinformatics. .

Abstract

Background: Knowing the subcellular location of proteins provides clues to their function as well as the interconnectivity of biological processes. Dozens of tools are available for predicting protein location in the eukaryotic cell. Each tool performs well on certain data sets, but their predictions often disagree for a given protein. Since the individual tools each have particular strengths, we set out to integrate them in a way that optimally exploits their potential. The method we present here is applicable to various subcellular locations, but tailored for predicting whether or not a protein is localized in mitochondria. Knowledge of the mitochondrial proteome is relevant to understanding the role of this organelle in global cellular processes.

Results: In order to develop a method for enhanced prediction of subcellular localization, we integrated the outputs of available localization prediction tools by several strategies, and tested the performance of each strategy with known mitochondrial proteins. The accuracy obtained (up to 92%) surpasses by far the individual tools. The method of integration proved crucial to the performance. For the prediction of mitochondrion-located proteins, integration via a two-layer decision tree clearly outperforms simpler methods, as it allows emphasis of biologically relevant features such as the mitochondrial targeting peptide and transmembrane domains.

Conclusion: We developed an approach that enhances the prediction accuracy of mitochondrial proteins by uniting the strength of specialized tools. The combination of machine-learning based integration with biological expert knowledge leads to improved performance. This approach also alleviates the conundrum of how to choose between conflicting predictions. Our approach is easy to implement, and applicable to predicting subcellular locations other than mitochondria, as well as other biological features. For a trial of our approach, we provide a webservice for mitochondrial protein prediction (named YimLOC), which can be accessed through the AnaBench suite at http://anabench.bcm.umontreal.ca/anabench/. The source code is provided in the Additional File 2.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Prediction performance of individual and integrated tools on yeast mitochondrial proteins. Filled symbols: individual LOC-tools; Dots: voting groups (tools integrated by majority-win voting); Open symbols: decision trees. The desired results are located in the top left of the plot area, representing high true positive rate and low false positive rate. a, the result shown at full scale. b, the zoom-in of the region with false positive rate 0~0.25, and true positive rate 0.3~0.95.
Figure 2
Figure 2
Integration of heterogeneous prediction tools by decision trees. a, The LOC-DT was built with outputs from nine LOC-tools. b, The MTP-DT was built with outputs from four tools whose prediction is based on the mitochondrial targeting peptide. The output of MTP-DT, together with the outputs of five other LOC-tools, was used to construct the STACK-DT.
Figure 3
Figure 3
Prediction performance of individual and integrated tools on yeast mitochondrial membrane and matrix proteins. Loc-tools recognize mitochondrial membrane proteins less efficiently than matrix proteins. The effectiveness of PASUB is due to the fact that it exploits annotations and that the portion of annotated mitochondrial membrane proteins is higher compared to matrix proteins.
Figure 4
Figure 4
Decision tree topology for the prediction of mitochondrial proteins. a, STACK-mem-DT; b, MTP-DT. The trees were built by C4.5 (see Methods). Each oval represents a prediction tool. Filled ovals represent transmembrane domain predictors. Rectangle represents a decision: "mit" for mitochondrial proteins and "non" for proteins of other subcellular locations. If a tool predicts the query protein as a mitochondrial protein, the branch (edge) is labeled "mit"; otherwise "non". If PASUB makes no prediction, the branch is labeled "N". Several decision-making paths are highlighted, as follows: Dotted line: for non-mitochondrial protein YDR378C. Grey line: for mitochondrial protein YOR297C. Blue arrow: the common path for three differently localized proteins: mitochondrial (YIL065C), plasma membrane (YBR069C) and nuclear (YLL022C). Orange arrow: for mitochondrial protein YIL065C. Red arrow: for non-mitochondrial protein YBR069C. Green arrow: for non-mitochondrial protein YLL022C.

Similar articles

Cited by

References

    1. Pan YX, Zhang ZZ, Guo ZM, Feng GY, Huang ZD, He L. Application of pseudo amino acid composition for predicting protein subcellular location: stochastic signal processing approach. J Protein Chem. 2003;22:395–402. doi: 10.1023/A:1025350409648. - DOI - PubMed
    1. Chou KC, Shen HB. Predicting protein subcellular location by fusing multiple classifiers. J Cell Biochem. 2006;99:517–527. doi: 10.1002/jcb.20879. - DOI - PubMed
    1. Shi JY, Zhang SW, Pan Q, Cheng YM, Xie J. Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids. 2007;33:69–74. doi: 10.1007/s00726-006-0475-y. - DOI - PubMed
    1. Shen HB, Chou KC. Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem Biophys Res Commun. 2007;355:1006–1011. doi: 10.1016/j.bbrc.2007.02.071. - DOI - PubMed
    1. Chou KC, Shen HB. Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. J Proteome Res. 2007;6:1728–1734. - PubMed

Publication types