. 2021 Mar 21;22(1):136.

doi: 10.1186/s12859-021-04073-z.

A representation learning model based on variational inference and graph autoencoder for predicting lncRNA-disease associations

Zhuangwei Shi¹, Han Zhang², Chen Jin³, Xiongwen Quan¹, Yanbin Yin⁴

Affiliations

¹ College of Artificial Intelligence, Nankai University, Tongyan Road, 300350, Tianjin, China.
² College of Artificial Intelligence, Nankai University, Tongyan Road, 300350, Tianjin, China. zhanghan@nankai.edu.cn.
³ College of Computer Science, Nankai University, Tongyan Road, 300350, Tianjin, China.
⁴ Department of Food Science and Technology, Nebraska Food for Health Center, University of Nebraska-Lincoln, 1400 R Street, Lincoln, NE, 68588, USA.

PMID: 33745450
PMCID: PMC7983260
DOI: 10.1186/s12859-021-04073-z

A representation learning model based on variational inference and graph autoencoder for predicting lncRNA-disease associations

Zhuangwei Shi et al. BMC Bioinformatics. 2021.

. 2021 Mar 21;22(1):136.

doi: 10.1186/s12859-021-04073-z.

Authors

Zhuangwei Shi¹, Han Zhang², Chen Jin³, Xiongwen Quan¹, Yanbin Yin⁴

Affiliations

¹ College of Artificial Intelligence, Nankai University, Tongyan Road, 300350, Tianjin, China.
² College of Artificial Intelligence, Nankai University, Tongyan Road, 300350, Tianjin, China. zhanghan@nankai.edu.cn.
³ College of Computer Science, Nankai University, Tongyan Road, 300350, Tianjin, China.
⁴ Department of Food Science and Technology, Nebraska Food for Health Center, University of Nebraska-Lincoln, 1400 R Street, Lincoln, NE, 68588, USA.

PMID: 33745450
PMCID: PMC7983260
DOI: 10.1186/s12859-021-04073-z

Abstract

Background: Numerous studies have demonstrated that long non-coding RNAs are related to plenty of human diseases. Therefore, it is crucial to predict potential lncRNA-disease associations for disease prognosis, diagnosis and therapy. Dozens of machine learning and deep learning algorithms have been adopted to this problem, yet it is still challenging to learn efficient low-dimensional representations from high-dimensional features of lncRNAs and diseases to predict unknown lncRNA-disease associations accurately.

Results: We proposed an end-to-end model, VGAELDA, which integrates variational inference and graph autoencoders for lncRNA-disease associations prediction. VGAELDA contains two kinds of graph autoencoders. Variational graph autoencoders (VGAE) infer representations from features of lncRNAs and diseases respectively, while graph autoencoders propagate labels via known lncRNA-disease associations. These two kinds of autoencoders are trained alternately by adopting variational expectation maximization algorithm. The integration of both the VGAE for graph representation learning, and the alternate training via variational inference, strengthens the capability of VGAELDA to capture efficient low-dimensional representations from high-dimensional features, and hence promotes the robustness and preciseness for predicting unknown lncRNA-disease associations. Further analysis illuminates that the designed co-training framework of lncRNA and disease for VGAELDA solves a geometric matrix completion problem for capturing efficient low-dimensional representations via a deep learning approach.

Conclusion: Cross validations and numerical experiments illustrate that VGAELDA outperforms the current state-of-the-art methods in lncRNA-disease association prediction. Case studies indicate that VGAELDA is capable of detecting potential lncRNA-disease associations. The source code and data are available at https://github.com/zhanglabNKU/VGAELDA .

Keywords: Graph autoencoder; Representation learning; Variational inference; lncRNA-disease association.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
ROC and PR curves of different methods on Dataset1. In AUROC, VGAELDA (AUROC = 0.9680) outperforms GAMCLDA (0.9299), SKFLDA (0.9154), TPGLDA (0.7936), SIMCLDA (0.8293) and LRLSLDA (0.8157). In AUPR, VGAELDA (AUPR = 0.8380) outperforms GAMCLDA (0.5794), SKFLDA (0.4024), TPGLDA (0.5308), SIMCLDA (0.5357) and LRLSLDA (0.2035)

**Fig. 2**
ROC and PR curves of different methods on Dataset2. In AUROC, VGAELDA (AUROC = 0.9692) outperforms GAMCLDA (0.8841), SKFLDA (0.8524), TPGLDA (0.8771), SIMCLDA (0.8146) and LRLSLDA (0.8627). In AUPR, VGAELDA (AUPR = 0.8203) outperforms GAMCLDA (0.3798), SKFLDA (0.2831), TPGLDA (0.3192), SIMCLDA (0.1189) and LRLSLDA (0.1812)

**Fig. 3**
True positive samples at different cutoffs on Dataset2

**Fig. 4**
Framework of VGAELDA. Step 1: lncRNA features $X_{l}$ are embeddings of lncRNA sequences computed by Word2Vec, while disease features $X_{d}$ are associations with genes. Step 2: constructing graph $G_{l}$ and $G_{d}$ through Eq. (16) for lncRNAs and diseases, respectively. Step 3: GNNql and GNNpl are applied to $G_{l}$ , that they require $X_{l}$ and Y as inputs, while GNNqd and GNNpd applied to $G_{d}$ require $X_{d}$ and $Y^{T}$ as inputs. Step 4: training GNNq and GNNp alternately via variational EM algorithm, while training GNNql and GNNqd collaboratively. Step 5: final result fusion by Eq. (28)

See this image and copyright information in PMC

Cited by

Predicting miRNA-Disease Association Based on Neural Inductive Matrix Completion with Graph Autoencoders and Self-Attention Mechanism.
Jin C, Shi Z, Lin K, Zhang H. Jin C, et al. Biomolecules. 2022 Jan 2;12(1):64. doi: 10.3390/biom12010064. Biomolecules. 2022. PMID: 35053212 Free PMC article.
IGCNSDA: unraveling disease-associated snoRNAs with an interpretable graph convolutional network.
Hu X, Zhang P, Liu D, Zhang J, Zhang Y, Dong Y, Fan Y, Deng L. Hu X, et al. Brief Bioinform. 2024 Mar 27;25(3):bbae179. doi: 10.1093/bib/bbae179. Brief Bioinform. 2024. PMID: 38647155 Free PMC article.
LDAGM: prediction lncRNA-disease asociations by graph convolutional auto-encoder and multilayer perceptron based on multi-view heterogeneous networks.
Zhang B, Wang H, Ma C, Huang H, Fang Z, Qu J. Zhang B, et al. BMC Bioinformatics. 2024 Oct 15;25(1):332. doi: 10.1186/s12859-024-05950-z. BMC Bioinformatics. 2024. PMID: 39407120 Free PMC article.
Predicting lncRNA-disease associations using multiple metapaths in hierarchical graph attention networks.
Yao D, Deng Y, Zhan X, Zhan X. Yao D, et al. BMC Bioinformatics. 2024 Jan 29;25(1):46. doi: 10.1186/s12859-024-05672-2. BMC Bioinformatics. 2024. PMID: 38287236 Free PMC article.
Predicting noncoding RNA and disease associations using multigraph contrastive learning.
Sun SL, Jiang YY, Yang JP, Xiu YH, Bilal A, Long HX. Sun SL, et al. Sci Rep. 2025 Jan 2;15(1):230. doi: 10.1038/s41598-024-81862-5. Sci Rep. 2025. PMID: 39747154 Free PMC article.

See all "Cited by" articles

References

1. Wapinski O, Chang HY. Long noncoding RNAs and human disease. Trends Cell Biol. 2011;21(6):354–361. doi: 10.1016/j.tcb.2011.04.001. - DOI - PubMed
1. Jalali S, Kapoor S, Sivadas A, Bhartiya D, Scaria V. Computational approaches towards understanding human long non-coding RNA biology. Bioinformatics. 2015;31(14):2241–2251. doi: 10.1093/bioinformatics/btv148. - DOI - PubMed
1. Chen X, Yan CC, Zhang X, You Z-H. Long non-coding RNAs and complex diseases: from experimental results to computational models. Brief Bioinform. 2016;18(4):558–576. - PMC - PubMed
1. Sang Y, Tang J, Li S, Li L, Tang X-F, Cheng C, Luo Y, Qian X, Deng L-M, Liu L, Lv X-B. LncRNA PANDAR regulates the g1/s transition of breast cancer cells by suppressing p16(INK4A) expression. Sci Rep. 2016;6:22366. doi: 10.1038/srep22366. - DOI - PMC - PubMed
1. Sun M, Xia R, Jin F, Xu T, Liu Z, De W, Liu X. Downregulated long noncoding RNA meg3 is associated with poor prognosis and promotes cell proliferation in gastric cancer. Tumor Biol. 2014;35:1065–1073. doi: 10.1007/s13277-013-1142-z. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

61973174/National Natural Science Foundation of China

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A representation learning model based on variational inference and graph autoencoder for predicting lncRNA-disease associations

Affiliations

A representation learning model based on variational inference and graph autoencoder for predicting lncRNA-disease associations

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases