Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jun 28;8(6):e67008.
doi: 10.1371/journal.pone.0067008. Print 2013.

In silico platform for prediction of N-, O- and C-glycosites in eukaryotic protein sequences

Affiliations

In silico platform for prediction of N-, O- and C-glycosites in eukaryotic protein sequences

Jagat Singh Chauhan et al. PLoS One. .

Abstract

Glycosylation is one of the most abundant and an important post-translational modification of proteins. Glycosylated proteins (glycoproteins) are involved in various cellular biological functions like protein folding, cell-cell interactions, cell recognition and host-pathogen interactions. A large number of eukaryotic glycoproteins also have therapeutic and potential technology applications. Therefore, characterization and analysis of glycosites (glycosylated residues) in these proteins is of great interest to biologists. In order to cater these needs a number of in silico tools have been developed over the years, however, a need to get even better prediction tools remains. Therefore, in this study we have developed a new webserver GlycoEP for more accurate prediction of N-linked, O-linked and C-linked glycosites in eukaryotic glycoproteins using two larger datasets, namely, standard and advanced datasets. In case of standard datasets no two glycosylated proteins are more similar than 40%; advanced datasets are highly non-redundant where no two glycosites' patterns (as defined in methods) have more than 60% similarity. Further, based on our results with several algorihtms developed using different machine-learning techniques, we found Support Vector Machine (SVM) as optimum tool to develop glycosite prediction models. Accordingly, using our more stringent and non-redundant advanced datasets, the SVM based models developed in this study achieved a prediction accuracy of 84.26%, 86.87% and 91.43% with corresponding MCC of 0.54, 0.20 and 0.78, for N-, O- and C-linked glycosites, respectively. The best performing models trained on advanced datasets were then implemented as a user-friendly web server GlycoEP (http://www.imtech.res.in/raghava/glycoep/). Additionally, this server provides prediction models developed on standard datasets and allows users to scan sequons in input protein sequences.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Flowchart showing process for creating various datasets used for developing GlycoEP models.
Figure 2
Figure 2. The process of creating of overlapping patterns in a glycoproteins and assigning glycosylated and non-glycosylated patterns.
Figure 3
Figure 3. Performances of various models on standard datasets in term of ROC, for N-, O- and C-linked glycosites (Panel A, B and C, respectively) in eukaryotic proteins.

References

    1. Hart GW (1992) Glycosylation. Curr Opin Cell Biol 4: 1017–1023. - PubMed
    1. Haltiwanger R, Lowe J (2004) Role of Glycosylation in Development. Annual Review of Biochemistry 73: 491–537. - PubMed
    1. Miyamoto S (2006) Clinical applications of glycomic approaches for the detection of cancer and other diseases. Curr Opin Mol Ther 8: 507–513. - PubMed
    1. Helenius A, Aebi M (2004) Roles of N-linked glycans in the endoplasmic reticulum. Annu Rev Biochem 73: 1019–1049. - PubMed
    1. Helenius A, Aebi M (2001) Intracellular Functions of N-linked glycans. Science 291: 2364–2369. - PubMed

Publication types