Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy
- PMID: 19874585
- PMCID: PMC2777180
- DOI: 10.1186/1471-2105-10-361
Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy
Abstract
Background: Tyrosine sulfation is one of the most important posttranslational modifications. Due to its relevance to various disease developments, tyrosine sulfation has become the target for drug design. In order to facilitate efficient drug design, accurate prediction of sulfotyrosine sites is desirable. A predictor published seven years ago has been very successful with claimed prediction accuracy of 98%. However, it has a particularly low sensitivity when predicting sulfotyrosine sites in some newly sequenced proteins.
Results: A new approach has been developed for predicting sulfotyrosine sites using the random forest algorithm after a careful evaluation of seven machine learning algorithms. Peptides are formed by consecutive residues symmetrically flanking tyrosine sites. They are then encoded using an amino acid hydrophobicity scale. This new approach has increased the sensitivity by 22%, the specificity by 3%, and the total prediction accuracy by 10% compared with the previous predictor using the same blind data. Meanwhile, both negative and positive predictive powers have been increased by 9%. In addition, the random forest model has an excellent feature for ranking the residues flanking tyrosine sites, hence providing more information for further investigating the tyrosine sulfation mechanism. A web tool has been implemented at http://ecsb.ex.ac.uk/sulfotyrosine for public use.
Conclusion: The random forest algorithm is able to deliver a better model compared with the Hidden Markov Model, the support vector machine, artificial neural networks, and others for predicting sulfotyrosine sites. The success shows that the random forest algorithm together with an amino acid hydrophobicity scale encoding can be a good candidate for peptide classification.
Figures
References
-
- Hille A, Rosa P, Huttner WB. Tyrosine sulfation: a post-translational modification of proteins destined for secretion? FEBS Lett. 1984;177:129–134. - PubMed
-
- Andersen BN. Species variation in the tyrosine sulfation of mammalian gastrins. Gen Comp Endocrinol. 1985;58:44–50. - PubMed
-
- Negishi M, Pedersen LG, Petrotchenko E, Shevtsov S, Gorokhov A, Kakuta Y, Pedersen LC. Structure and function of sulfotransferases. Arch Biochem Biophys. 2001;390:149–157. - PubMed
-
- Leitinger B, Brown JL, Spies M. Tagging secretory and membrane proteins witha tyrosine sulfation site. The Journal of Biological Chemistry. 1984;269:8115–8121. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
