Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan 20:15:46.
doi: 10.1186/1471-2164-15-46.

Plus ça change - evolutionary sequence divergence predicts protein subcellular localization signals

Affiliations

Plus ça change - evolutionary sequence divergence predicts protein subcellular localization signals

Yoshinori Fukasawa et al. BMC Genomics. .

Abstract

Background: Protein subcellular localization is a central problem in understanding cell biology and has been the focus of intense research. In order to predict localization from amino acid sequence a myriad of features have been tried: including amino acid composition, sequence similarity, the presence of certain motifs or domains, and many others. Surprisingly, sequence conservation of sorting motifs has not yet been employed, despite its extensive use for tasks such as the prediction of transcription factor binding sites.

Results: Here, we flip the problem around, and present a proof of concept for the idea that the lack of sequence conservation can be a novel feature for localization prediction. We show that for yeast, mammal and plant datasets, evolutionary sequence divergence alone has significant power to identify sequences with N-terminal sorting sequences. Moreover sequence divergence is nearly as effective when computed on automatically defined ortholog sets as on hand curated ones. Unfortunately, sequence divergence did not necessarily increase classification performance when combined with some traditional sequence features such as amino acid composition. However a post-hoc analysis of the proteins in which sequence divergence changes the prediction yielded some proteins with atypical (i.e. not MPP-cleaved) matrix targeting signals as well as a few misannotations.

Conclusion: We report the results of the first quantitative study of the effectiveness of evolutionary sequence divergence as a feature for protein subcellular localization prediction. We show that divergence is indeed useful for prediction, but it is not trivial to improve overall accuracy simply by adding this feature to classical sequence features. Nevertheless we argue that sequence divergence is a promising feature and show anecdotal examples in which it succeeds where other features fail.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Relationship between mean divergence score and the number of sequence in MSA’s. A box plot illustrating the mean, quartiles and range of the column entropy score for MSA’s in the yeast autoOrthoMSA dataset partitioned by the number of sequences in the MSA.
Figure 2
Figure 2
An example of MTS containing protein. A multiple sequence alignment of the protein mtHSP70 (UniProt accession P0CS90) and its orthologs from five species of yeast. The red box indicates the cleaved MTS in S.cere. Conserved positions are colored by Jalview.
Figure 3
Figure 3
Local divergence score over N-terminal region. Average local divergence scores are shown for the 100 residue N-terminal region of: MTS containing, SP containing, and N-signal-free proteins. Top left panel is calculated from orthologs of yeast curated dataset, and the others from automatically collected orthologs. For the plant dataset, CTP containing proteins are also shown. The error bars denote standard error. For clarity, error bars are only shown for every fifth position.
Figure 4
Figure 4
Importance of each feature. The importance of each attribute as estimated by information gain is shown for the YGOB ortholog set. At left, the divergence related scores are shown by light blue color lines. For local divergence features LD(i), only the residue number i is listed. Dark blue colored lines denote standard features of the N-terminal 40 residues such as physico-chemical properties or amino acid composition. The suffix “f” denotes amino acid composition from the full length of the protein.
Figure 5
Figure 5
Correlation between divergence and physico-chemical properties. Scatter plots of LD(13) (on the vertical axis) vs physico-chemical property (A) average hydrophobiciy, (B) number of negatively charged residues and (C) arginine composition for the YGOB ortholog set (MTS proteins are shown in red, SP in blue and N-signal-free proteins in green).
Figure 6
Figure 6
MSA of FMP52 and its orthologs in 11 yeast species. Multiple sequence alignment of FMP52 in S.cerevisiae and its orthologs in other 10 yeast species. The red boxed region shows annotated MTS of FMP52. The conserved positions are colored by Jalview.
Figure 7
Figure 7
MSA of MrpL32 and its orthologs in 11 yeast species. Multiple sequence alignment of MrpL32 in S.cerevisiae and its orthologs in 10 other yeast species. The red boxed region shows MTS of MrpL32. The conserved positions are colored by Jalview.

Similar articles

Cited by

References

    1. Eisenhaber F, Bork P. Wanted: subcellular localization of proteins based on sequence. Trends Cell Biol. 1998;15:169–170. doi: 10.1016/S0962-8924(98)01226-4. - DOI - PubMed
    1. Kumar A, Agarwal S, Heyman JA, Matson S, Heidtman M, Piccirillo S, Umansky L, Drawid A, Jansen R, Liu Y, Cheung KH, Miller P, Gerstein M, Roeder GS, Snyder M. Subcellular localization of the yeast proteome. Genes Dev. 2002;15(6):707–719. doi: 10.1101/gad.970902. - DOI - PMC - PubMed
    1. Huh WK, Falvo JV, Gerke LG, Carroll AS, Howson RW, Weissman JS, O’Shea EK. Global analysis of protein localization in budding yeast. Nature. 2003;15(6959):689–691. - PubMed
    1. Imai K, Nakai K. Prediction of subcellular locations of proteins: where to proceed? Proteomics. 2010;15(22):3970–3983. doi: 10.1002/pmic.201000274. - DOI - PubMed
    1. Nair R, Rost B. Sequence conserved for subcellular localization. Protein Sci. 2002;15(12):2836–2847. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources