Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 18;25(1):381.
doi: 10.1186/s12859-024-05985-2.

DeepMiRBP: a hybrid model for predicting microRNA-protein interactions based on transfer learning and cosine similarity

Affiliations

DeepMiRBP: a hybrid model for predicting microRNA-protein interactions based on transfer learning and cosine similarity

Sasan Azizian et al. BMC Bioinformatics. .

Abstract

Background: Interactions between microRNAs and RNA-binding proteins are crucial for microRNA-mediated gene regulation and sorting. Despite their significance, the molecular mechanisms governing these interactions remain underexplored, apart from sequence motifs identified on microRNAs. To date, only a limited number of microRNA-binding proteins have been confirmed, typically through labor-intensive experimental procedures. Advanced bioinformatics tools are urgently needed to facilitate this research.

Methods: We present DeepMiRBP, a novel hybrid deep learning model specifically designed to predict microRNA-binding proteins by modeling molecular interactions. This innovation approach is the first to target the direct interactions between small RNAs and proteins. DeepMiRBP consists of two main components. The first component employs bidirectional long short-term memory (Bi-LSTM) neural networks to capture sequential dependencies and context within RNA sequences, attention mechanisms to enhance the model's focus on the most relevant features and transfer learning to apply knowledge gained from a large dataset of RNA-protein binding sites to the specific task of predicting microRNA-protein interactions. Cosine similarity is applied to assess RNA similarities. The second component utilizes Convolutional Neural Networks (CNNs) to process the spatial data inherent in protein structures based on Position-Specific Scoring Matrices (PSSM) and contact maps to generate detailed and accurate representations of potential microRNA-binding sites and assess protein similarities.

Results: DeepMiRBP achieved a prediction accuracy of 87.4% during training and 85.4% using testing, with an F score of 0.860. Additionally, we validated our method using three case studies, focusing on microRNAs such as miR-451, -19b, -23a, -21, -223, and -let-7d. DeepMiRBP successfully predicted known miRNA interactions with recently discovered RNA-binding proteins, including AGO, YBX1, and FXR2, identified in various exosomes.

Conclusions: Our proposed DeepMiRBP strategy represents the first of its kind designed for microRNA-protein interaction prediction. Its promising performance underscores the model's potential to uncover novel interactions critical for small RNA sorting and packaging, as well as to infer new RNA transporter proteins. The methodologies and insights from DeepMiRBP offer a scalable template for future small RNA research, from mechanistic discovery to modeling disease-related cell-to-cell communication, emphasizing its adaptability and potential for developing novel small RNA-centric therapeutic interventions and personalized medicine.

Keywords: Deep learning; Interaction prediction; MicroRNAs; RNA binding proteins; RNA sorting.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The schematic workflow of the DeepMiRBP model. In the first component, the source-domain model is trained based on RNA sequences related to known binding sites of different RBPs(RNA-binding proteins). After this training phase, the learned parameters are transferred to the target domain using a transfer learning approach. The target model is then retrained using sequences of protein-interacting miRNAs as input. A cosine similarity measure is applied to identify and rank RBP sequences from the source domain that are most similar to the given miRNA, resulting in a ranked list of candidate proteins. The candidate proteins identified in the first component undergo further analysis in the second component. Position-Specific Scoring Matrices (PSSM) and contact maps are utilized for each candidate protein to perform a more comprehensive similarity assessment. This step enhances the understanding of miRNA-protein interactions, thereby improving the model’s prediction accuracy
Fig. 2
Fig. 2
The detailed architecture of DeepMiRBP in both components for predicting microRNA-protein interactions. a First component architecture: This model focuses on training RNA sequences that bind to RBPs to capture intricate features of RNA-protein interactions. Initially, the model learns from RNA sequences bound by RBPs and transfers this knowledge to the target domain. Here, miRNA sequences serve as input, generating embedding codes. Cosine similarity is then applied to identify RNA sequences most similar to the miRNA sequences. b Second component architecture: In this model, each RBP candidate identified in the first part is processed using PSSM and contact maps. CNN layers and max-pooling are employed to encode these matrices. Subsequently, cosine similarity is calculated to compare RBP candidates with other proteins, resulting in a matrix that identifies proteins with a higher probability of binding to the miRNA sequence
Fig. 3
Fig. 3
Confusion matrix for test data in the source domain
Fig. 4
Fig. 4
ROC Performance. The ROC curve for predicting RNA-protein binding sites on 31 experiment datasets

Similar articles

References

    1. Alipanahi B, Delong A, Weirauch MT, et al. Predicting the sequence specificities of dna-and rna-binding proteins by deep learning. Nat Biotechnol. 2015;33(8):831–8. - PubMed
    1. Altschul SF, Madden TL, Schäffer AA, et al. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402. - PMC - PubMed
    1. Azizian S. A data-driven discovery system for studying extracellular micro rna sorting and rna-protein interactions. PhD thesis, The University of Nebraska-Lincoln. 2024.
    1. Bartel DP. Micrornas: genomics, biogenesis, mechanism, and function. Cell. 2004;116(2):281–97. - PubMed
    1. Bartel DP. Micrornas: target recognition and regulatory functions. Cell. 2009;136(2):215–33. - PMC - PubMed

LinkOut - more resources