MLSNet: a deep learning model for predicting transcription factor binding sites
- PMID: 39350338
- PMCID: PMC11442149
- DOI: 10.1093/bib/bbae489
MLSNet: a deep learning model for predicting transcription factor binding sites
Abstract
Accurate prediction of transcription factor binding sites (TFBSs) is essential for understanding gene regulation mechanisms and the etiology of diseases. Despite numerous advances in deep learning for predicting TFBSs, their performance can still be enhanced. In this study, we propose MLSNet, a novel deep learning architecture designed specifically to predict TFBSs. MLSNet innovatively integrates multisize convolutional fusion with long short-term memory (LSTM) networks to effectively capture DNA-sparse higher-order sequence features. Further, MLSNet incorporates super token attention and Bi-LSTM to systematically extract and integrate higher-order DNA shape features. Experimental results on 165 ChIP-seq (chromatin immunoprecipitation followed by sequencing) datasets indicate that MLSNet consistently outperforms several state-of-the-art algorithms in the prediction of TFBSs. Specifically, MLSNet reports average metrics: 0.8306 for ACC, 0.8992 for AUROC, and 0.9035 for AUPRC, surpassing the second-best methods by 1.82%, 1.68%, and 1.54%, respectively. This research delineates the effectiveness of combining multi-size convolutional layers with LSTM and DNA shape-based features in enhancing predictive accuracy. Moreover, this study comprehensively assesses the variability in model performance across different cell lines and transcription factors. The source code of MLSNet is available at https://github.com/minghaidea/MLSNet.
Keywords: DNA sequence; DNA shape; multisize convolutional fusion; super token attention and Bi-LSTM; transcription factor binding sites.
© The Author(s) 2024. Published by Oxford University Press.
Figures






References
-
- Guo JT, Lofgren S, Farrel A. Structure-based prediction of transcription factor binding sites. Tsinghua Sci Technol 2014;19:568–77.
-
- Kaiser MI. ENCODE and the parts of the human genome. Stud Hist Phil Biol Biomed Sci 2018;72:28–37. - PubMed
-
- Stormo Gary D. [13] consensus patterns in dna. Elsevier 1990;211–21. - PubMed
MeSH terms
Substances
Grants and funding
- 62372234/National Natural Science Foundation of China
- BK20201304/Natural Science Foundation of Jiangsu
- NY223062/Major Inter-Disciplinary Research project awarded by Monash University, and the Natural Science Research Start-up Foundation of Recruiting Talents of Nanjing University of Posts and Telecommunications
LinkOut - more resources
Full Text Sources