Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 24:13:995532.
doi: 10.3389/fgene.2022.995532. eCollection 2022.

Geometric complement heterogeneous information and random forest for predicting lncRNA-disease associations

Affiliations

Geometric complement heterogeneous information and random forest for predicting lncRNA-disease associations

Dengju Yao et al. Front Genet. .

Abstract

More and more evidences have showed that the unnatural expression of long non-coding RNA (lncRNA) is relevant to varieties of human diseases. Therefore, accurate identification of disease-related lncRNAs can help to understand lncRNA expression at the molecular level and to explore more effective treatments for diseases. Plenty of lncRNA-disease association prediction models have been raised but it is still a challenge to recognize unknown lncRNA-disease associations. In this work, we have proposed a computational model for predicting lncRNA-disease associations based on geometric complement heterogeneous information and random forest. Firstly, geometric complement heterogeneous information was used to integrate lncRNA-miRNA interactions and miRNA-disease associations verified by experiments. Secondly, lncRNA and disease features consisted of their respective similarity coefficients were fused into input feature space. Thirdly, an autoencoder was adopted to project raw high-dimensional features into low-dimension space to learn representation for lncRNAs and diseases. Finally, the low-dimensional lncRNA and disease features were fused into input feature space to train a random forest classifier for lncRNA-disease association prediction. Under five-fold cross-validation, the AUC (area under the receiver operating characteristic curve) is 0.9897 and the AUPR (area under the precision-recall curve) is 0.7040, indicating that the performance of our model is better than several state-of-the-art lncRNA-disease association prediction models. In addition, case studies on colon and stomach cancer indicate that our model has a good ability to predict disease-related lncRNAs.

Keywords: autoencoder; geometric complement heterogeneous information; lncRNA-disease association prediction; machine learning; random forest.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
The flowchart of constructing the GCHIRFLDA model.
FIGURE 2
FIGURE 2
The ROC Curves of different classifiers in the GCHIRFLDA model.
FIGURE 3
FIGURE 3
The Precision-Recall Curves of different classifiers in the GCHIRFLDA model.
FIGURE 4
FIGURE 4
The ROC Curves of different LDA prediction models.

Similar articles

Cited by

References

    1. Bahari F., Emadi-Baygi M., Nikpour P. (2015). miR-17-92 host gene, uderexpressed in gastric cancer and its expression was negatively correlated with the metastasis. Indian J. Cancer 52, 22–25. 10.4103/0019-509X.175605 - DOI - PubMed
    1. Bao Z., Yang Z., Huang Z., Zhou Y., Cui Q., Dong D. (2019). LncRNADisease 2.0: An updated database of long non-coding RNA-associated diseases. Nucleic Acids Res. 47, D1034–D1037. 10.1093/nar/gky905 - DOI - PMC - PubMed
    1. Barsyte-Lovejoy D., Lau S. K., Boutros P. C., Khosravi F., Jurisica I., Andrulis I. L., et al. (2006). The c-Myc oncogene directly induces the H19 noncoding RNA by allele-specific binding to potentiate tumorigenesis. Cancer Res. 66, 5330–5337. 10.1158/0008-5472.can-06-0037 - DOI - PubMed
    1. Breiman L. (2001). Random forests. Mach. Learn. 45, 5–32. 10.1023/a:1010933404324 - DOI
    1. Cheetham S. W., Gruhl F., Mattick J. S., Dinger M. E. (2013). Long noncoding RNAs and the genetics of cancer. Br. J. Cancer 108, 2419–2425. 10.1038/bjc.2013.233 - DOI - PMC - PubMed

LinkOut - more resources