. 2022 Jan 17;23(1):bbab374.

doi: 10.1093/bib/bbab374.

Assessing deep learning methods in cis-regulatory motif finding based on genomic sequencing data

Shuangquan Zhang¹, Anjun Ma², Jing Zhao², Dong Xu³, Qin Ma², Yan Wang^{1

4}

Affiliations

¹ Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
² Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA.
³ Department of Electrical Engineering and Computer Science, and Christopher S. Bond Life Science Center, University of Missouri, MO, 65211, USA.
⁴ School of Artificial Intelligence, Jilin University, Changchun, 130012, China.

PMID: 34607350
PMCID: PMC8769700
DOI: 10.1093/bib/bbab374

Assessing deep learning methods in cis-regulatory motif finding based on genomic sequencing data

Shuangquan Zhang et al. Brief Bioinform. 2022.

. 2022 Jan 17;23(1):bbab374.

doi: 10.1093/bib/bbab374.

Authors

Shuangquan Zhang¹, Anjun Ma², Jing Zhao², Dong Xu³, Qin Ma², Yan Wang^{1

4}

Affiliations

¹ Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
² Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA.
³ Department of Electrical Engineering and Computer Science, and Christopher S. Bond Life Science Center, University of Missouri, MO, 65211, USA.
⁴ School of Artificial Intelligence, Jilin University, Changchun, 130012, China.

PMID: 34607350
PMCID: PMC8769700
DOI: 10.1093/bib/bbab374

Abstract

Identifying cis-regulatory motifs from genomic sequencing data (e.g. ChIP-seq and CLIP-seq) is crucial in identifying transcription factor (TF) binding sites and inferring gene regulatory mechanisms for any organism. Since 2015, deep learning (DL) methods have been widely applied to identify TF binding sites and predict motif patterns, with the strengths of offering a scalable, flexible and unified computational approach for highly accurate predictions. As far as we know, 20 DL methods have been developed. However, without a clear and systematic assessment, users will struggle to choose the most appropriate tool for their specific studies. In this manuscript, we evaluated 20 DL methods for cis-regulatory motif prediction using 690 ENCODE ChIP-seq, 126 cancer ChIP-seq and 55 RNA CLIP-seq data. Four metrics were investigated, including the accuracy of motif finding, the performance of DNA/RNA sequence classification, algorithm scalability and tool usability. The assessment results demonstrated the high complementarity of the existing DL methods. It was determined that the most suitable model should primarily depend on the data size and type and the method's outputs.

Keywords: CLIP-seq; ChIP-seq; TF binding sites identification; deep learning method assessment; motif prediction.

PubMed Disclaimer

Figures

**Figure 1**
ChIP-seq data input and five categories of DL methods. Outcomes include both predicted sequence labels and identified motif patterns.

**Figure 2**
Schematic overview of the evaluation pipeline. AEMR score assesses the sequence classification ability based on F1_score, recall, precision, PRC, AUC, MCC, specificity and ACC between predicted classification labels and ChIP-seq peak labels. The motif prediction score (with a P-value and a similarity) assesses how well the predicted motifs can be, based on the documented TFBSs.

**Figure 3**
Illustration of evaluation results for the 20 DL tools. (A) For DNA sequence-based analysis, tools were separated by DL methods. In each comparative group, tools were ranked by their overall score (grey) from high to low. Four evaluation scores were shown: AEMR (blue), motif prediction score (green), algorithm scalability (pink) and tool usability (yellow). The highest score for each evaluation score is highlighted in a red box. The result of the conventional method gkmSVM and MEME-ChIP was also shown at the bottom for comparison. (B) For RNA sequence-based analysis, the same columns and labels were used as described in A.

**Figure 4**
Analysis of motif analysis on nine cancer types. (A) AEMR scores of the 15 DL methods across the nine cancer types. (B) Box plot of motif enrichment P-value (with details in the Method section) of 11 methods with respect to breast cancer. (C) For each cancer type, we calculate the average number of identified motifs for each tool. Note that, we only keep motifs that can be matched with existing motif patterns in the database using TOMTOM and TFBSTools. The horizontal red line indicates the highest median value on the y-axis. (D) The shared motifs between the nine different cancer types. Motifs shared between breast cancer and colorectal cancer were highlighted as cyan, and all other shared links were light grey.

See this image and copyright information in PMC

Cited by

Predicting miRNA-disease associations based on multi-view information fusion.
Xie X, Wang Y, Sheng N, Zhang S, Cao Y, Fu Y. Xie X, et al. Front Genet. 2022 Sep 27;13:979815. doi: 10.3389/fgene.2022.979815. eCollection 2022. Front Genet. 2022. PMID: 36238163 Free PMC article.
Identifying transcription factors with cell-type specific DNA binding signatures.
Awdeh A, Turcotte M, Perkins TJ. Awdeh A, et al. BMC Genomics. 2024 Oct 14;25(1):957. doi: 10.1186/s12864-024-10859-1. BMC Genomics. 2024. PMID: 39402535 Free PMC article.
Uncovering the Relationship between Tissue-Specific TF-DNA Binding and Chromatin Features through a Transformer-Based Model.
Zhang Y, Liu Y, Wang Z, Wang M, Xiong S, Huang G, Gong M. Zhang Y, et al. Genes (Basel). 2022 Oct 26;13(11):1952. doi: 10.3390/genes13111952. Genes (Basel). 2022. PMID: 36360189 Free PMC article.
MMGAT: a graph attention network framework for ATAC-seq motifs finding.
Wu X, Hou W, Zhao Z, Huang L, Sheng N, Yang Q, Zhang S, Wang Y. Wu X, et al. BMC Bioinformatics. 2024 Apr 20;25(1):158. doi: 10.1186/s12859-024-05774-x. BMC Bioinformatics. 2024. PMID: 38643066 Free PMC article.
MMGraph: a multiple motif predictor based on graph neural network and coexisting probability for ATAC-seq data.
Zhang S, Yang L, Wu X, Sheng N, Fu Y, Ma A, Wang Y. Zhang S, et al. Bioinformatics. 2022 Sep 30;38(19):4636-4638. doi: 10.1093/bioinformatics/btac572. Bioinformatics. 2022. PMID: 35997564 Free PMC article.

See all "Cited by" articles

References

1. Lin Quy Xiao X, Thieffry D, Jha S, et al. . TFregulomeR reveals transcription factors’ context-specific features and functions. Nucleic Acids Res 2019;48:e10–0. - PMC - PubMed
1. Bhagwat AS, Vakoc CR. Targeting transcription factors in cancer. Trends Cancer 2015;1:53–65. - PMC - PubMed
1. D'haeseleer P. What are DNA sequence motifs? Nat Biotechnol 2006;24:423–5. - PubMed
1. Chen H, Li H, Liu F, et al. . An integrative analysis of TFBS-clustered regions reveals new transcriptional regulation models on the accessible chromatin landscape. Sci Rep 2015;5:8465. - PMC - PubMed
1. Zambelli F, Pesole G, Pavesi G. Motif discovery and transcription factor binding sites before and after the next-generation sequencing era. Brief Bioinform 2013;14:225–37. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

P30 CA016058/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Assessing deep learning methods in cis-regulatory motif finding based on genomic sequencing data

Affiliations

Assessing deep learning methods in cis-regulatory motif finding based on genomic sequencing data

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous