MMGAT: a graph attention network framework for ATAC-seq motifs finding
- PMID: 38643066
- PMCID: PMC11031952
- DOI: 10.1186/s12859-024-05774-x
MMGAT: a graph attention network framework for ATAC-seq motifs finding
Abstract
Background: Motif finding in Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) data is essential to reveal the intricacies of transcription factor binding sites (TFBSs) and their pivotal roles in gene regulation. Deep learning technologies including convolutional neural networks (CNNs) and graph neural networks (GNNs), have achieved success in finding ATAC-seq motifs. However, CNN-based methods are limited by the fixed width of the convolutional kernel, which makes it difficult to find multiple transcription factor binding sites with different lengths. GNN-based methods has the limitation of using the edge weight information directly, makes it difficult to aggregate the neighboring nodes' information more efficiently when representing node embedding.
Results: To address this challenge, we developed a novel graph attention network framework named MMGAT, which employs an attention mechanism to adjust the attention coefficients among different nodes. And then MMGAT finds multiple ATAC-seq motifs based on the attention coefficients of sequence nodes and k-mer nodes as well as the coexisting probability of k-mers. Our approach achieved better performance on the human ATAC-seq datasets compared to existing tools, as evidenced the highest scores on the precision, recall, F1_score, ACC, AUC, and PRC metrics, as well as finding 389 higher quality motifs. To validate the performance of MMGAT in predicting TFBSs and finding motifs on more datasets, we enlarged the number of the human ATAC-seq datasets to 180 and newly integrated 80 mouse ATAC-seq datasets for multi-species experimental validation. Specifically on the mouse ATAC-seq dataset, MMGAT also achieved the highest scores on six metrics and found 356 higher-quality motifs. To facilitate researchers in utilizing MMGAT, we have also developed a user-friendly web server named MMGAT-S that hosts the MMGAT method and ATAC-seq motif finding results.
Conclusions: The advanced methodology MMGAT provides a robust tool for finding ATAC-seq motifs, and the comprehensive server MMGAT-S makes a significant contribution to genomics research. The open-source code of MMGAT can be found at https://github.com/xiaotianr/MMGAT , and MMGAT-S is freely available at https://www.mmgraphws.com/MMGAT-S/ .
Keywords: ATAC-seq; Coexisting probabilities; Graph attention network; Motif finding; TFBSs prediction.
© 2024. The Author(s).
Conflict of interest statement
The authors declare that they have no competing interests.
Figures


Similar articles
-
GNNMF: a multi-view graph neural network for ATAC-seq motif finding.BMC Genomics. 2024 Mar 21;25(1):300. doi: 10.1186/s12864-024-10218-0. BMC Genomics. 2024. PMID: 38515040 Free PMC article.
-
MMGraph: a multiple motif predictor based on graph neural network and coexisting probability for ATAC-seq data.Bioinformatics. 2022 Sep 30;38(19):4636-4638. doi: 10.1093/bioinformatics/btac572. Bioinformatics. 2022. PMID: 35997564 Free PMC article.
-
CacPred: a cascaded convolutional neural network for TF-DNA binding prediction.BMC Genomics. 2025 Mar 18;26(Suppl 2):264. doi: 10.1186/s12864-025-11399-y. BMC Genomics. 2025. PMID: 40102719 Free PMC article.
-
From reads to insight: a hitchhiker's guide to ATAC-seq data analysis.Genome Biol. 2020 Feb 3;21(1):22. doi: 10.1186/s13059-020-1929-3. Genome Biol. 2020. PMID: 32014034 Free PMC article. Review.
-
Detect accessible chromatin using ATAC-sequencing, from principle to applications.Hereditas. 2019 Aug 15;156:29. doi: 10.1186/s41065-019-0105-9. eCollection 2019. Hereditas. 2019. PMID: 31427911 Free PMC article. Review.
References
-
- Doganli C, Sandoval M, Thomas S, Hart D. Assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-Seq) protocol for zebrafish embryos. Eukaryotic Transcriptional and Post-Transcriptional Gene Expression Regulation. 2017;:59–66. - PubMed
MeSH terms
Substances
Grants and funding
- 62302218/the Young Scientists Fund of the National Natural Science Foundation of China
- 62072212/the National Natural Science Foundation of China
- 20220508125RC, 20230201065GX/the Development Project of Jilin Province of China
- 20210504003GH/the Jilin Provincial Key Laboratory of Big Data Intelligent Cognition
LinkOut - more resources
Full Text Sources
Miscellaneous