Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 20;25(1):158.
doi: 10.1186/s12859-024-05774-x.

MMGAT: a graph attention network framework for ATAC-seq motifs finding

Affiliations

MMGAT: a graph attention network framework for ATAC-seq motifs finding

Xiaotian Wu et al. BMC Bioinformatics. .

Abstract

Background: Motif finding in Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) data is essential to reveal the intricacies of transcription factor binding sites (TFBSs) and their pivotal roles in gene regulation. Deep learning technologies including convolutional neural networks (CNNs) and graph neural networks (GNNs), have achieved success in finding ATAC-seq motifs. However, CNN-based methods are limited by the fixed width of the convolutional kernel, which makes it difficult to find multiple transcription factor binding sites with different lengths. GNN-based methods has the limitation of using the edge weight information directly, makes it difficult to aggregate the neighboring nodes' information more efficiently when representing node embedding.

Results: To address this challenge, we developed a novel graph attention network framework named MMGAT, which employs an attention mechanism to adjust the attention coefficients among different nodes. And then MMGAT finds multiple ATAC-seq motifs based on the attention coefficients of sequence nodes and k-mer nodes as well as the coexisting probability of k-mers. Our approach achieved better performance on the human ATAC-seq datasets compared to existing tools, as evidenced the highest scores on the precision, recall, F1_score, ACC, AUC, and PRC metrics, as well as finding 389 higher quality motifs. To validate the performance of MMGAT in predicting TFBSs and finding motifs on more datasets, we enlarged the number of the human ATAC-seq datasets to 180 and newly integrated 80 mouse ATAC-seq datasets for multi-species experimental validation. Specifically on the mouse ATAC-seq dataset, MMGAT also achieved the highest scores on six metrics and found 356 higher-quality motifs. To facilitate researchers in utilizing MMGAT, we have also developed a user-friendly web server named MMGAT-S that hosts the MMGAT method and ATAC-seq motif finding results.

Conclusions: The advanced methodology MMGAT provides a robust tool for finding ATAC-seq motifs, and the comprehensive server MMGAT-S makes a significant contribution to genomics research. The open-source code of MMGAT can be found at https://github.com/xiaotianr/MMGAT , and MMGAT-S is freely available at https://www.mmgraphws.com/MMGAT-S/ .

Keywords: ATAC-seq; Coexisting probabilities; Graph attention network; Motif finding; TFBSs prediction.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
MMGAT method. (A) The first layer of MMGAT initializes embeddings hsimk· and hcok· for k-mer nodes k· in similarity and coexisting subgraphs, respectively. It employs an attention mechanism to independently learn k-mer embeddings Esimk· and Ecok· in both subgraphs. Subsequently these two kinds of k-mer embeddings are input to the second layer to learn inclusive-similarity and inclusive-coexisting attention coefficients in inclusive subgraphs respectively. Then these two types of attention coefficients and two types of k-mer embeddings are aggregated as the embedding of sequence nodes. Finally, the sequence embeddings are input to the fully connected layer to predict TFBSs. (B) MMGAT finds k-mer seeds based on inclusive-similarity and inclusive-coexisting attention coefficients learned in the second layer, and then finds TFBSs of multiple lengths based on coexisting probabilities
Fig. 2
Fig. 2
Visualization page of motif finding results on MMGAT-S

Similar articles

References

    1. Chen H, Li H, Liu F, Zheng X, Wang S, Bo X, et al. An integrative analysis of TFBS-clustered regions reveals new transcriptional regulation models on the accessible chromatin landscape. Sci Rep. 2015;5:8465. doi: 10.1038/srep08465. - DOI - PMC - PubMed
    1. Lindström S, Loomis S, Turman C, Huang H, Huang J, Aschard H, et al. A comprehensive survey of genetic variation in 20,691 subjects from four large cohorts. PLoS ONE. 2017;12:e0173997. doi: 10.1371/journal.pone.0173997. - DOI - PMC - PubMed
    1. Das MK, Dai H-K. A survey of DNA motif finding algorithms. BMC Bioinform. 2007;8:1–13. doi: 10.1186/1471-2105-8-S7-S21. - DOI - PMC - PubMed
    1. Buenrostro JD, Wu B, Chang HY, Greenleaf WJ. ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr Protoc Mol Biol. 2015;109:21–29. doi: 10.1002/0471142727.mb2129s109. - DOI - PMC - PubMed
    1. Doganli C, Sandoval M, Thomas S, Hart D. Assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-Seq) protocol for zebrafish embryos. Eukaryotic Transcriptional and Post-Transcriptional Gene Expression Regulation. 2017;:59–66. - PubMed

LinkOut - more resources