Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Oct 8:9:458.
doi: 10.3389/fgene.2018.00458. eCollection 2018.

Accurate Prediction of ncRNA-Protein Interactions From the Integration of Sequence and Evolutionary Information

Affiliations

Accurate Prediction of ncRNA-Protein Interactions From the Integration of Sequence and Evolutionary Information

Zhao-Hui Zhan et al. Front Genet. .

Abstract

Non-coding RNA (ncRNA) plays a crucial role in numerous biological processes including gene expression and post-transcriptional gene regulation. The biological function of ncRNA is mostly realized by binding with related proteins. Therefore, an accurate understanding of interactions between ncRNA and protein has a significant impact on current biological research. The major challenge at this stage is the waste of a great deal of redundant time and resource consumed on classification in traditional interaction pattern prediction methods. Fortunately, an efficient classifier named LightGBM can solve this difficulty of long time consumption. In this study, we employed LightGBM as the integrated classifier and proposed a novel computational model for predicting ncRNA and protein interactions. More specifically, the pseudo-Zernike Moments and singular value decomposition algorithm are employed to extract the discriminative features from protein and ncRNA sequences. On four widely used datasets RPI369, RPI488, RPI1807, and RPI2241, we evaluated the performance of LGBM and obtained an superior performance with AUC of 0.799, 0.914, 0.989, and 0.762, respectively. The experimental results of 10-fold cross-validation shown that the proposed method performs much better than existing methods in predicting ncRNA-protein interaction patterns, which could be used as a useful tool in proteomics research.

Keywords: LightGBM; PSSM; Pseudo-Zernike moments; k-mers; ncRNA-protein interactions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Step-wise work flow for the purposed LGBM machine learning model.
Figure 2
Figure 2
The ROC curve of dataset RPI369 on three classifiers.
Figure 3
Figure 3
The ROC curve of dataset RPI488 on three classifiers.
Figure 4
Figure 4
The ROC curve of dataset RPI488 on 10-fold cross- validation.
Figure 5
Figure 5
The ROC curve of dataset RPI1807 on 10-fold cross- validation.
Figure 6
Figure 6
The ROC curve of dataset RPI2241 on 10-fold cross- validation.

Similar articles

Cited by

References

    1. Akbaripour-Elahabad M., Zahiri J., Rafeh R., Eslami M., Azari M. (2016). rpiCOOL: a tool for In Silico RNA–protein interaction detection using random forest. J. Theor. Biol. 402, 1–8. 10.1016/j.jtbi.2016.04.025 - DOI - PubMed
    1. Appel R., Fuchs T., Perona P. (2013). Quickly boosting decision trees, pruning underachieving features early, in International Conference on International Conference on Machine Learning: 2013 (Atlanta, GA: ), III-594.
    1. Berman H. M., Westbrook J., Feng Z., Gilliland G., Bhat T. N., Weissig H., et al. (2000). The Protein Data Bank, 1999–. Int. Tables Crystallograp. 67, 675–684. 10.1107/97809553602060000722 - DOI
    1. Chen T., Guestrin C. (2016). XGBoost: a scalable tree boosting system. arXiv:1603.02754. 2016, 785–94.
    1. Cheng Z., Kai H., Yang W., Hui L., Guan J., Zhou S. (2017). Selecting high-quality negative samples for effectively predicting protein-RNA interactions. BMC Syst. Biol. 11(Suppl. 2):9. 10.1186/s12918-017-0390-8 - DOI - PMC - PubMed

LinkOut - more resources