Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 2;26(4):bbaf443.
doi: 10.1093/bib/bbaf443.

BBANsh: a deep learning architecture based on BERT and bilinear attention networks to identify potent shRNA

Affiliations

BBANsh: a deep learning architecture based on BERT and bilinear attention networks to identify potent shRNA

Yuanting Chen et al. Brief Bioinform. .

Abstract

RNA interference (RNAi) is a technique for precisely silencing the expression of specific genes by means of small RNA molecules and is essential in functional genomics. Among the commonly used RNAi molecules, short hairpin RNAs (shRNAs) exhibit advantages over small interfering RNAs, including longer half-life, comparable silencing efficiency, fewer off-target effects, and greater safety. However, traditional screening of potent shRNAs is costly and time-consuming. Advances in big data and artificial intelligence have enabled computational methods to significantly accelerate shRNA design and prediction. In this study, we propose BBANsh, a new shRNA prediction model based on bidirectional encoder representation from transformers (BERT) and bilinear attention network (BAN). We comprehensively evaluate the performance of BBANsh against traditional feature-based models, various feature fusion methods, and existing shRNA prediction models. The BBANsh has achieved an area under the precision-recall curve of 0.951 on five-cross validation and a prediction accuracy of 0.896 on a new external validation set, highlighting its superior predictive performance. Ablation experiments validate the significant contributions of BERT and BAN to model performance. The visualization of internal feature representations intuitively demonstrates the effectiveness of the feature fusion strategy of BBANsh. Furthermore, the attentional analysis reveals that nucleotides near the 5' end have the greatest impact on model predictions, highlighting sequence characteristics of potent shRNAs. Overall, BBANsh provides an efficient and reliable tool for shRNA prediction, which can offer valuable support for researchers in the precise selection and design of shRNA.

Keywords: BERT; bilinear attention network; shRNA prediction.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The workflow of BBANsh. (a) Collection of shRNA dataset: data were collected from the TELE and M1 datasets, both of which were generated using sensor-based experimental systems. (b) Encoding process using BERT for shRNA: a pretrained BERT model was fine-tuned on the shRNA dataset, and the final [CLS] token embeddings were used to represent each shRNA sequence. (c) Feature fusion with bilinear attention network: the embeddings derived from DNABERT and GENA-LM, denoted as formula image and formula image, respectively, were fused using a bilinear attention network. (d) The complete framework of BBANsh: an overview of the BBANsh architecture, integrating the encoding, fusion, and prediction components.
Figure 2
Figure 2
The performance of five-fold cross-validation and test validation for various models and BBANsh. (a) the comparison of BBANsh and traditional feature-based models, including kmer (k = 3), PseDNC, and PseKNC. (b) The comparison of BBANsh with other models utilizing different feature fusion strategies, including concatenation (Concat) and Co-attention mechanisms.
Figure 3
Figure 3
The performance of BBANsh and ILGBMSH using leave-one-gene-out cross-validation across nine genes. (a) Receiver operating characteristic (ROC) curves for each model on all nine genes. (b) Precision–recall (PR) curves illustrating the precision versus recall performance for BBANsh and ILGBMSH.
Figure 4
Figure 4
The ablation results of BBANsh variants on five-fold cross-validation and test set validation. Model variants are evaluated by removing key components: (i) noBAN: replacing the bilinear attention network with simple concatenation, (ii) noGENA-LM: removing the GENA-LM embedding, and (iii) noDNABERT: removing the DNABERT embedding.
Figure 5
Figure 5
T-SNE visualization of feature layers of the BBANsh model (including features retrieved from joint features post-BAN fusion, 128-dimensional MLP layer features, and 64-dimensional MLP layer features), models constructed using traditional features (including kmer, PseDNC, and PseKNC), DNABERT, GENA-LM, and models employing various fusion methodologies (Concat and Co-attention).
Figure 6
Figure 6
Visualization of the attention weights on sequence features in the BBANsh model, highlighting which nucleotide positions the model attends to most during prediction.

Similar articles

References

    1. Kawasaki H, Taira K, Morris KV. siRNA induced transcriptional gene silencing in mammalian cells. Cell Cycle 2005;4:442–8. 10.4161/cc.4.3.1520. - DOI - PubMed
    1. Jadhav V, Vaishnaw A, Fitzgerald K. et al. RNA interference in the era of nucleic acid therapeutics. Nat Biotechnol 2024;42:394–405. 10.1038/s41587-023-02105-y. - DOI - PubMed
    1. Chen X, Mangala LS, Rodriguez-Aguayo C. et al. RNA interference-based therapy and its delivery systems. Cancer Metastasis Rev 2018;37:107–24. 10.1007/s10555-017-9717-6. - DOI - PMC - PubMed
    1. Sontheimer EJ, Carthew RW. Argonaute journeys into the heart of RISC. Science 2004;305:1409–10. 10.1126/science.1103076. - DOI - PubMed
    1. Robb GB, Rana TM. RNA helicase a interacts with RISC in human cells and functions in RISC loading. Mol Cell 2007;26:523–37. 10.1016/j.molcel.2007.04.016. - DOI - PubMed

Substances

LinkOut - more resources