Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 21;22(1):136.
doi: 10.1186/s12859-021-04073-z.

A representation learning model based on variational inference and graph autoencoder for predicting lncRNA-disease associations

Affiliations

A representation learning model based on variational inference and graph autoencoder for predicting lncRNA-disease associations

Zhuangwei Shi et al. BMC Bioinformatics. .

Abstract

Background: Numerous studies have demonstrated that long non-coding RNAs are related to plenty of human diseases. Therefore, it is crucial to predict potential lncRNA-disease associations for disease prognosis, diagnosis and therapy. Dozens of machine learning and deep learning algorithms have been adopted to this problem, yet it is still challenging to learn efficient low-dimensional representations from high-dimensional features of lncRNAs and diseases to predict unknown lncRNA-disease associations accurately.

Results: We proposed an end-to-end model, VGAELDA, which integrates variational inference and graph autoencoders for lncRNA-disease associations prediction. VGAELDA contains two kinds of graph autoencoders. Variational graph autoencoders (VGAE) infer representations from features of lncRNAs and diseases respectively, while graph autoencoders propagate labels via known lncRNA-disease associations. These two kinds of autoencoders are trained alternately by adopting variational expectation maximization algorithm. The integration of both the VGAE for graph representation learning, and the alternate training via variational inference, strengthens the capability of VGAELDA to capture efficient low-dimensional representations from high-dimensional features, and hence promotes the robustness and preciseness for predicting unknown lncRNA-disease associations. Further analysis illuminates that the designed co-training framework of lncRNA and disease for VGAELDA solves a geometric matrix completion problem for capturing efficient low-dimensional representations via a deep learning approach.

Conclusion: Cross validations and numerical experiments illustrate that VGAELDA outperforms the current state-of-the-art methods in lncRNA-disease association prediction. Case studies indicate that VGAELDA is capable of detecting potential lncRNA-disease associations. The source code and data are available at https://github.com/zhanglabNKU/VGAELDA .

Keywords: Graph autoencoder; Representation learning; Variational inference; lncRNA-disease association.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
ROC and PR curves of different methods on Dataset1. In AUROC, VGAELDA (AUROC = 0.9680) outperforms GAMCLDA (0.9299), SKFLDA (0.9154), TPGLDA (0.7936), SIMCLDA (0.8293) and LRLSLDA (0.8157). In AUPR, VGAELDA (AUPR = 0.8380) outperforms GAMCLDA (0.5794), SKFLDA (0.4024), TPGLDA (0.5308), SIMCLDA (0.5357) and LRLSLDA (0.2035)
Fig. 2
Fig. 2
ROC and PR curves of different methods on Dataset2. In AUROC, VGAELDA (AUROC = 0.9692) outperforms GAMCLDA (0.8841), SKFLDA (0.8524), TPGLDA (0.8771), SIMCLDA (0.8146) and LRLSLDA (0.8627). In AUPR, VGAELDA (AUPR = 0.8203) outperforms GAMCLDA (0.3798), SKFLDA (0.2831), TPGLDA (0.3192), SIMCLDA (0.1189) and LRLSLDA (0.1812)
Fig. 3
Fig. 3
True positive samples at different cutoffs on Dataset2
Fig. 4
Fig. 4
Framework of VGAELDA. Step 1: lncRNA features Xl are embeddings of lncRNA sequences computed by Word2Vec, while disease features Xd are associations with genes. Step 2: constructing graph Gl and Gd through Eq. (16) for lncRNAs and diseases, respectively. Step 3: GNNql and GNNpl are applied to Gl, that they require Xl and Y as inputs, while GNNqd and GNNpd applied to Gd require Xd and YT as inputs. Step 4: training GNNq and GNNp alternately via variational EM algorithm, while training GNNql and GNNqd collaboratively. Step 5: final result fusion by Eq. (28)

Similar articles

Cited by

References

    1. Wapinski O, Chang HY. Long noncoding RNAs and human disease. Trends Cell Biol. 2011;21(6):354–361. doi: 10.1016/j.tcb.2011.04.001. - DOI - PubMed
    1. Jalali S, Kapoor S, Sivadas A, Bhartiya D, Scaria V. Computational approaches towards understanding human long non-coding RNA biology. Bioinformatics. 2015;31(14):2241–2251. doi: 10.1093/bioinformatics/btv148. - DOI - PubMed
    1. Chen X, Yan CC, Zhang X, You Z-H. Long non-coding RNAs and complex diseases: from experimental results to computational models. Brief Bioinform. 2016;18(4):558–576. - PMC - PubMed
    1. Sang Y, Tang J, Li S, Li L, Tang X-F, Cheng C, Luo Y, Qian X, Deng L-M, Liu L, Lv X-B. LncRNA PANDAR regulates the g1/s transition of breast cancer cells by suppressing p16(INK4A) expression. Sci Rep. 2016;6:22366. doi: 10.1038/srep22366. - DOI - PMC - PubMed
    1. Sun M, Xia R, Jin F, Xu T, Liu Z, De W, Liu X. Downregulated long noncoding RNA meg3 is associated with poor prognosis and promotes cell proliferation in gastric cancer. Tumor Biol. 2014;35:1065–1073. doi: 10.1007/s13277-013-1142-z. - DOI - PubMed

Substances