Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Nov 15:11:563.
doi: 10.1186/1471-2105-11-563.

TESTLoc: protein subcellular localization prediction from EST data

Affiliations

TESTLoc: protein subcellular localization prediction from EST data

Yao-Qing Shen et al. BMC Bioinformatics. .

Abstract

Background: The eukaryotic cell has an intricate architecture with compartments and substructures dedicated to particular biological processes. Knowing the subcellular location of proteins not only indicates how bio-processes are organized in different cellular compartments, but also contributes to unravelling the function of individual proteins. Computational localization prediction is possible based on sequence information alone, and has been successfully applied to proteins from virtually all subcellular compartments and all domains of life. However, we realized that current prediction tools do not perform well on partial protein sequences such as those inferred from Expressed Sequence Tag (EST) data, limiting the exploitation of the large and taxonomically most comprehensive body of sequence information from eukaryotes.

Results: We developed a new predictor, TESTLoc, suited for subcellular localization prediction of proteins based on their partial sequence conceptually translated from ESTs (EST-peptides). Support Vector Machine (SVM) is used as computational method and EST-peptides are represented by different features such as amino acid composition and physicochemical properties. When TESTLoc was applied to the most challenging test case (plant data), it yielded high accuracy (~85%).

Conclusions: TESTLoc is a localization prediction tool tailored for EST data. It provides a variety of models for the users to choose from, and is available for download at http://megasun.bch.umontreal.ca/~shenyq/TESTLoc/TESTLoc.html.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Fragmentation procedure of plant protein sequences in order to expand the EST-peptide dataset. Open bars, full-length proteins; filled bars, fragmented protein sequences. Proteins shorter than 200 residues remained unchanged. Proteins ranging from 200 to 400 residues were fragmented into two pieces. Proteins longer than 400 residues were fragmented into three pieces. See text for details.
Figure 2
Figure 2
Training and evaluation of SVM predictors. The circle and pies indicate the dataset and portions thereof. The procedure in each dashed box was repeated ten times. The whole dataset was randomly divided into ten parts, with nine parts combined to construct the SVM model, and the remaining one to evaluate the model. The combined data for model construction were further divided randomly into ten subsets, in which nine subsets were combined to serve as training data, and the 10th subset served as test data. See text for details.
Figure 3
Figure 3
Independent evaluation of SVM predictors based on different representations of amino acid composition. The performance was assessed by the Matthews Correlation Coefficient (MCC). For most classes, the best MCC was obtained with the 4th order amino acid composition (the frequency of tetra-peptides). Amino acid group-C and group-D composition yielded similar results (see Additional file 5).
Figure 4
Figure 4
Integration of predictions from SVM models based on individual features. Each of the 41 SVM models built with single sequence features forms the first layer SVM and emits the probabilities for the query sequence to belong to the various classes. The probabilities are used as input for the second layer SVM.

Similar articles

Cited by

References

    1. Huh WK, Falvo JV, Gerke LC, Carroll AS, Howson RW, Weissman JS, O'Shea EK. Global analysis of protein localization in budding yeast. Nature. 2003;425(6959):686–691. doi: 10.1038/nature02026. - DOI - PubMed
    1. Kumar A, Agarwal S, Heyman JA, Matson S, Heidtman M, Piccirillo S, Umansky L, Drawid A, Jansen R, Liu Y. et al.Subcellular localization of the yeast proteome. Genes Dev. 2002;16(6):707–719. doi: 10.1101/gad.970902. - DOI - PMC - PubMed
    1. Barbe L, Lundberg E, Oksvold P, Stenius A, Lewin E, Bjorling E, Asplund A, Ponten F, Brismar H, Uhlen M. et al.Toward a confocal subcellular atlas of the human proteome. Mol Cell Proteomics. 2008;7(3):499–508. - PubMed
    1. Lascaris R, Bussemaker HJ, Boorsma A, Piper M, van der Spek H, Grivell L, Blom J. Hap4p overexpression in glucose-grown Saccharomyces cerevisiae induces cells to enter a novel metabolic state. Genome Biol. 2003;4(1):R3. doi: 10.1186/gb-2002-4-1-r3. - DOI - PMC - PubMed
    1. Yuan HM, Li KL, Ni RJ, Guo WD, Shen Z, Yang CP, Wang BC, Liu GF, Guo CH, Jiang J. A systemic proteomic analysis of Populus chloroplast by using shotgun method. Mol Biol Rep. 2010. in press . - PubMed

Publication types