Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Dec 13:12:781277.
doi: 10.3389/fgene.2021.781277. eCollection 2021.

Predicting Pseudogene-miRNA Associations Based on Feature Fusion and Graph Auto-Encoder

Affiliations
Review

Predicting Pseudogene-miRNA Associations Based on Feature Fusion and Graph Auto-Encoder

Shijia Zhou et al. Front Genet. .

Abstract

Pseudogenes were originally regarded as non-functional components scattered in the genome during evolution. Recent studies have shown that pseudogenes can be transcribed into long non-coding RNA and play a key role at multiple functional levels in different physiological and pathological processes. microRNAs (miRNAs) are a type of non-coding RNA, which plays important regulatory roles in cells. Numerous studies have shown that pseudogenes and miRNAs have interactions and form a ceRNA network with mRNA to regulate biological processes and involve diseases. Exploring the associations of pseudogenes and miRNAs will facilitate the clinical diagnosis of some diseases. Here, we propose a prediction model PMGAE (Pseudogene-MiRNA association prediction based on the Graph Auto-Encoder), which incorporates feature fusion, graph auto-encoder (GAE), and eXtreme Gradient Boosting (XGBoost). First, we calculated three types of similarities including Jaccard similarity, cosine similarity, and Pearson similarity between nodes based on the biological characteristics of pseudogenes and miRNAs. Subsequently, we fused the above similarities to construct a similarity profile as the initial representation features for nodes. Then, we aggregated the similarity profiles and associations of nodes to obtain the low-dimensional representation vector of nodes through a GAE. In the last step, we fed these representation vectors into an XGBoost classifier to predict new pseudogene-miRNA associations (PMAs). The results of five-fold cross validation show that PMGAE achieves a mean AUC of 0.8634 and mean AUPR of 0.8966. Case studies further substantiated the reliability of PMGAE for mining PMAs and the study of endogenous RNA networks in relation to diseases.

Keywords: ceRNA network; extreme gradient boosting; feature fusion; graph auto-encoder; microRNA; pseudogene.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Flowchart of PMGAE.
FIGURE 2
FIGURE 2
AUC (A) and AUPR (B) of PMGAE using five-fold cross validation. Insets represent the zoom-in view of local regions.
FIGURE 3
FIGURE 3
Comparison of AUC (A) and AUPR (B) of PMGAE and MF-based models.
FIGURE 4
FIGURE 4
Clustering results of nodes before (A) and after (B) embedding.
FIGURE 5
FIGURE 5
Model performance using various embedding methods.
FIGURE 6
FIGURE 6
AUC (A) and AUPR (B) using various classifiers.
FIGURE 7
FIGURE 7
AUC and AUPR of various hidden unit setups in the first (A) and second (B) layers of GAE.

Similar articles

Cited by

References

    1. Baldi P. (2012). Autoencoders, Unsupervised Learning, and Deep Architectures. Bellevue, WA: ICML Unsupervised and Transfer Learning, 37–49.
    1. Cao S., Lu W., Xu Q. (2015). “GraRep: Learning Graph Representations with Global Structural Information,” in Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, 891–900.
    1. Carninci P., Kasukawa T., Katayama S., Gough J., Frith M. C., Maeda N., et al. (2005). The Transcriptional Landscape of the Mammalian Genome. Science 309, 1559–1563. 10.1126/science.1112014 - DOI - PubMed
    1. Chen J., Hu L., Chen J., Wu F., Hu D., Xu G., et al. (2016). Low Expression lncRNA RPLP0P2 Is Associated with Poor Prognosis and Decreased Cell Proliferation and Adhesion Ability in Lung Adenocarcinoma. Oncol. Rep. 36 (3), 1665–1671. 10.3892/or.2016.4965 - DOI - PubMed
    1. Chen X. (2015). KATZLDA: KATZ Measure for the lncRNA-Disease Association Prediction. Sci. Rep. 5 (1), 16840. 10.1038/srep16840 - DOI - PMC - PubMed

LinkOut - more resources