Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Aug 10;12(8):e0181966.
doi: 10.1371/journal.pone.0181966. eCollection 2017.

Prediction of N-linked glycosylation sites using position relative features and statistical moments

Affiliations

Prediction of N-linked glycosylation sites using position relative features and statistical moments

Muhammad Aizaz Akmal et al. PLoS One. .

Abstract

Glycosylation is one of the most complex post translation modification in eukaryotic cells. Almost 50% of the human proteome is glycosylated as glycosylation plays a vital role in various biological functions such as antigen's recognition, cell-cell communication, expression of genes and protein folding. It is a significant challenge to identify glycosylation sites in protein sequences as experimental methods are time taking and expensive. A reliable computational method is desirable for the identification of glycosylation sites. In this study, a comprehensive technique for the identification of N-linked glycosylation sites has been proposed using machine learning. The proposed predictor was trained using an up-to-date dataset through back propagation algorithm for multilayer neural network. The results of ten-fold cross-validation and other performance measures such as accuracy, sensitivity, specificity and Mathew's correlation coefficient inferred that the accuracy of proposed tool is far better than the existing systems such as Glyomine, GlycoEP, Ensemble SVM and GPP.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The process of glycosylation.
Ribosomes attach to the cytoplasmic side of ER synthesis proteins. As protein moves, special enzymes attach to oligosaccharides via N-linkage.
Fig 2
Fig 2. The proposed model workflow.
The work flow of the proposed model is shown which includes four phases: Data Collection, Data Filtration, Feature Extraction and TNN.
Fig 3
Fig 3. Sequence logo for (–ve) N-linked glycosylation sites.
The logo depicts residues occurring on specific positions. All sites were aligned with non-glycosylated N-linked at position 0.
Fig 4
Fig 4. Sequence logo for (+ve) N-linked glycosylation sites.
The logo illustrates residues occurring on specific positions. All sites were aligned with glycosylated N-linked at position 0.
Fig 5
Fig 5. Process of neural network.
In neural network input values and initial weights are assigned to the network and based on these values network start its learning.
Fig 6
Fig 6. Multiple layer back propagation neural network.
Artificial Neural Network having multiple layers is used for the prediction of N-linked glycosylation sites.
Fig 7
Fig 7. A confusion matrix of the prediction model.
The values of TP, FN, FP and TN are 11461, 12, 0 and 11988 respectively. Overall accuracy also is 99.9% as shown.
Fig 8
Fig 8. ROC comparison graph.
The ROC graph comparison between proposed and other predictors like Ensemble SVM, Glycomine, GlycoEP and GPP.
Fig 9
Fig 9. Regression metric.
Regression Metric of proposed N-Linked predictor is shown. The regression value is 0.99 which shows it has a negligible error rate.
Fig 10
Fig 10. The validation of prediction model.
The validation process is illustrated. The accuracy of trained model is verified by testing it over partitioned test data.
Fig 11
Fig 11. K-fold cross validation.
The process of K-fold cross validation is shown. Red circles show the test data and green circles show the training data.
Fig 12
Fig 12. The process of Jackknife validation.
The jackknife validation is shown in which yellow circles show the test data and green circles shows the training data.

References

    1. Shi X, Brauburger K, Elliott RM. Role of N-linked glycans on Bunyamwera virus glycoproteins in intracellular trafficking, protein folding, and virus infectivity. Journal of virology. 2005. November 1;79(21):13725–34. doi: 10.1128/JVI.79.21.13725-13734.2005 - DOI - PMC - PubMed
    1. Steen PV, Rudd PM, Dwek RA, Opdenakker G. Concepts and principles of O-linked glycosylation. Critical reviews in biochemistry and molecular biology. 1998. January 1;33(3):151–208. doi: 10.1080/10409239891204198 - DOI - PubMed
    1. Aebi M. N-linked protein glycosylation in the ER. Biochimica et Biophysica Acta (BBA)-Molecular Cell Research. 2013. November 30;1833(11):2430–7. - PubMed
    1. Zhang H, Xiao-jun L, Martin DB, Aebersold R. Identification and quantification of N-linked glycoproteins using hydrazide chemistry, stable isotope labeling and mass spectrometry. Nature biotechnology. 2003. June 1;21(6):660 doi: 10.1038/nbt827 - DOI - PubMed
    1. Helenius A, Aebi M. Intracellular functions of N-linked glycans. Science. 2001. March 23;291(5512):2364–9. - PubMed

LinkOut - more resources