Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 22;25(1):bbad466.
doi: 10.1093/bib/bbad466.

LDA-VGHB: identifying potential lncRNA-disease associations with singular value decomposition, variational graph auto-encoder and heterogeneous Newton boosting machine

Affiliations

LDA-VGHB: identifying potential lncRNA-disease associations with singular value decomposition, variational graph auto-encoder and heterogeneous Newton boosting machine

Lihong Peng et al. Brief Bioinform. .

Abstract

Long noncoding RNAs (lncRNAs) participate in various biological processes and have close linkages with diseases. In vivo and in vitro experiments have validated many associations between lncRNAs and diseases. However, biological experiments are time-consuming and expensive. Here, we introduce LDA-VGHB, an lncRNA-disease association (LDA) identification framework, by incorporating feature extraction based on singular value decomposition and variational graph autoencoder and LDA classification based on heterogeneous Newton boosting machine. LDA-VGHB was compared with four classical LDA prediction methods (i.e. SDLDA, LDNFSGB, IPCARF and LDASR) and four popular boosting models (XGBoost, AdaBoost, CatBoost and LightGBM) under 5-fold cross-validations on lncRNAs, diseases, lncRNA-disease pairs and independent lncRNAs and independent diseases, respectively. It greatly outperformed the other methods with its prominent performance under four different cross-validations on the lncRNADisease and MNDR databases. We further investigated potential lncRNAs for lung cancer, breast cancer, colorectal cancer and kidney neoplasms and inferred the top 20 lncRNAs associated with them among all their unobserved lncRNAs. The results showed that most of the predicted top 20 lncRNAs have been verified by biomedical experiments provided by the Lnc2Cancer 3.0, lncRNADisease v2.0 and RNADisease databases as well as publications. We found that HAR1A, KCNQ1DN, ZFAT-AS1 and HAR1B could associate with lung cancer, breast cancer, colorectal cancer and kidney neoplasms, respectively. The results need further biological experimental validation. We foresee that LDA-VGHB was capable of identifying possible lncRNAs for complex diseases. LDA-VGHB is publicly available at https://github.com/plhhnu/LDA-VGHB.

Keywords: heterogeneous Newton boosting machine; lncRNA–disease association; singular value decomposition; variational graph autoencoder.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The pipeline for LDA prediction with SVD, VGAE and heterogeneous Newton boosting machine (LDA-VGHB). (i) Feature extraction. Features of lncRNAs and diseases are extracted by incorporating similarity computation, linear feature extraction based on SVD and nonlinear feature extraction based on VGAE. (ii) LDA classification. A heterogeneous Newton boosting machine is designed to classify unobserved LDAs.
Figure 2
Figure 2
The ROC and PR curves of LDA-VGHB and the other four LDA prediction methods. A-B and C-D, E-F and G-H, I-J and K-L and M-N and O-P denote the ROC and PR curves of five methods on the lncRNADisease and MNDR databases under 5-fold formula image, formula image, formula image, formula image, respectively.
Figure 3
Figure 3
Affects of linear features, nonlinear features and their combination on performance. A–D denote the performance of LDA-VGHB when using the three types of features on the lncRNADisease database under formula image, formula image, formula image and formula image, respectively. E–H denote the performance of LDA-VGHB when using the three types of features on the MNDR database under formula image, formula image, formula image and formula image, respectively.
Figure 4
Figure 4
The affect of the parameter formula image on the LDA prediction performance. A-B, C-D, E-F and G-H denote AUC and AUPR of LDA-VGHB based on different formula image values on the lncRNADisease and MNDR databases under formula image, formula image, formula image and formula image, respectively.
Figure 5
Figure 5
The predicted top 20 lncRNAs associated with lung cancer (A and B), breast cancer (C and D), colorectal cancer (E and F) and kidney neoplasms (G and H) on the lncRNADisease and MNDR databases. The solid line and dashed line denote a predicted LDA that can be validated and can not be validated.

References

    1. Wang KC, Chang HY. Molecular mechanisms of long noncoding rnas. Mol Cell 2011;43(6):904–14. - PMC - PubMed
    1. Fan Y, Chen M, Pan X. Gcrflda: scoring lncrna-disease associations using graph convolution matrix completion with conditional random field. Brief Bioinform 2022;23(1):bbab361. - PubMed
    1. Schwarzmueller L, Bril O, Vermeulen L, Léveillé N. Emerging role and therapeutic potential of lncrnas in colorectal cancer. Cancer 2020;12(12):3843. - PMC - PubMed
    1. Wang Y, Guoxian Y, Wang J, et al. Weighted matrix factorization on multi-relational data for lncrna-disease association prediction. Methods 2020;173:32–43. - PubMed
    1. Statello L, Guo C-J, Chen L-L, Huarte M. Gene regulation by long non-coding rnas and its biological functions. Nat Rev Mol Cell Biol 2021;22(2):96–118. - PMC - PubMed

Publication types

Substances