Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 5;25(23):13075.
doi: 10.3390/ijms252313075.

AEmiGAP: AutoEncoder-Based miRNA-Gene Association Prediction Using Deep Learning Method

Affiliations

AEmiGAP: AutoEncoder-Based miRNA-Gene Association Prediction Using Deep Learning Method

Seungwon Yoon et al. Int J Mol Sci. .

Abstract

MicroRNAs (miRNAs) play a crucial role in gene regulation and are strongly linked to various diseases, including cancer. This study presents AEmiGAP, an advanced deep learning model that integrates autoencoders with long short-term memory (LSTM) networks to predict miRNA-gene associations. By enhancing feature extraction through autoencoders, AEmiGAP captures intricate, latent relationships between miRNAs and genes with unprecedented accuracy, outperforming all existing models in miRNA-gene association prediction. A thoroughly curated dataset of positive and negative miRNA-gene pairs was generated using distance-based filtering methods, significantly improving the model's AUC and overall predictive accuracy. Additionally, this study proposes two case studies to highlight AEmiGAP's application: first, a top 30 list of miRNA-gene pairs with the highest predicted association scores among previously unknown pairs, and second, a list of the top 10 miRNAs strongly associated with each of five key oncogenes. These findings establish AEmiGAP as a new benchmark in miRNA-gene association prediction, with considerable potential to advance both cancer research and precision medicine.

Keywords: LSTM; autoencoders; bioinformatics; cancer genomics; deep learning; feature extraction; miRNA–gene association; precision medicine.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The overall workflow of the AEmiGAP model for miRNA–gene association prediction. (A) Protein2Vector embeddings of miRNA and gene sequence data from miRBase and BioMart. (B) Negative data filtering based on Euclidean, cosine, and Mahalanobis distances. (C) Autoencoder to extract latent vector features from the miRNA–gene pairs. (D) LSTM-based deep learning model for final miRNA–gene association prediction.
Figure 2
Figure 2
The structure of the autoencoder model used in our study. The encoder compresses input data into a latent vector, and the decoder reconstructs the original data from this compressed form. This model is designed to capture complex features from miRNA and gene sequences, improving the deep learning model’s performance.
Figure 3
Figure 3
Receiver Operating Characteristic (ROC) curve and confusion matrix for Fold-4, the best-performing fold in the 5-fold cross-validation of the AEmiGAP model. The ROC curve demonstrates a high Area Under the Curve (AUC) value of 0.9857, indicating excellent model performance in distinguishing between positive and negative miRNA–gene associations. The confusion matrix shows the distribution of true positives, false positives, true negatives, and false negatives. Fold-4 achieves the best balance between precision and recall, resulting in the highest overall accuracy and F1 score among the five folds.
Figure 3
Figure 3
Receiver Operating Characteristic (ROC) curve and confusion matrix for Fold-4, the best-performing fold in the 5-fold cross-validation of the AEmiGAP model. The ROC curve demonstrates a high Area Under the Curve (AUC) value of 0.9857, indicating excellent model performance in distinguishing between positive and negative miRNA–gene associations. The confusion matrix shows the distribution of true positives, false positives, true negatives, and false negatives. Fold-4 achieves the best balance between precision and recall, resulting in the highest overall accuracy and F1 score among the five folds.

Similar articles

Cited by

References

    1. Cai Y., Yu X., Hu S., Yu J. A brief review on the mechanisms of miRNA regulation. Genom. Proteom. Bioinform. 2009;7:147–154. doi: 10.1016/S1672-0229(08)60044-3. - DOI - PMC - PubMed
    1. Huang L., Zhang L., Chen X. Updated review of advances in microRNAs and complex diseases: Towards systematic evaluation of computational models. Brief. Bioinform. 2022;23:bbac407. doi: 10.1093/bib/bbac407. - DOI - PubMed
    1. Esquela-Kerscher A., Slack F.J. Oncomirs—microRNAs with a role in cancer. Nat. Rev. Cancer. 2006;6:259–269. doi: 10.1038/nrc1840. - DOI - PubMed
    1. Hayes J., Peruzzi P.P., Lawler S. MicroRNAs in cancer: Biomarkers, functions and therapy. Trends Mol. Med. 2014;20:460–469. doi: 10.1016/j.molmed.2014.06.005. - DOI - PubMed
    1. Deepthi K., Jereesh A., Liu Y. A deep learning ensemble approach to prioritize antiviral drugs against novel coronavirus SARS-CoV-2 for COVID-19 drug repurposing. Appl. Soft Comput. 2021;113:107945. - PMC - PubMed

LinkOut - more resources