. 2014 Oct 23;15(1):925.

doi: 10.1186/1471-2164-15-925.

SeAMotE: a method for high-throughput motif discovery in nucleic acid sequences

Federico Agostini, Davide Cirillo, Riccardo Delli Ponti, Gian Gaetano Tartaglia¹

Affiliations

PMID: 25341390
PMCID: PMC4223730
DOI: 10.1186/1471-2164-15-925

SeAMotE: a method for high-throughput motif discovery in nucleic acid sequences

Federico Agostini et al. BMC Genomics. 2014.

. 2014 Oct 23;15(1):925.

doi: 10.1186/1471-2164-15-925.

Authors

Federico Agostini, Davide Cirillo, Riccardo Delli Ponti, Gian Gaetano Tartaglia¹

Affiliation

¹ Gene Function and Evolution, Centre for Genomic Regulation (CRG), C/ Dr, Aiguader 88, 08003 Barcelona, Spain. gian.tartaglia@crg.es.

PMID: 25341390
PMCID: PMC4223730
DOI: 10.1186/1471-2164-15-925

Abstract

Background: The large amount of data produced by high-throughput sequencing poses new computational challenges. In the last decade, several tools have been developed for the identification of transcription and splicing factor binding sites.

Results: Here, we introduce the SeAMotE (Sequence Analysis of Motifs Enrichment) algorithm for discovery of regulatory regions in nucleic acid sequences. SeAMotE provides (i) a robust analysis of high-throughput sequence sets, (ii) a motif search based on pattern occurrences and (iii) an easy-to-use web-server interface. We applied our method to recently published data including 351 chromatin immunoprecipitation (ChIP) and 13 crosslinking immunoprecipitation (CLIP) experiments and compared our results with those of other well-established motif discovery tools. SeAMotE shows an average accuracy of 80% in finding discriminative motifs and outperforms other methods available in literature.

Conclusions: SeAMotE is a fast, accurate and flexible algorithm for the identification of sequence patterns involved in protein-DNA and protein-RNA recognition. The server can be freely accessed at http://s.tartaglialab.com/new_submission/seamote.

PubMed Disclaimer

Figures

**Figure 1**
**SeAMotE workflow.** Illustration of the method pipeline: red boxes indicate the coverage calculation and seed extension loop; dashed arrows and the blue box represent conditional steps that depend on the user-definable variables, such as providing or selecting a specific background set or filtering out patterns that are closely related.

**Figure 2**
**SeAMotE output summary.** Example of output table showing the list of motifs (IUPAC and RegEx) that better discriminate the input sets along with their logo representation and positional weighted matrix download button, positive and reference coverage (as percentage of sequences containing at least one pattern occurrence), discrimination (Youden’s index) and p-value (Fisher’s exact test). By clicking on the logo, it is possible to retrieve the image file (png format) of the associated motif.

**Figure 3**
**Annotated motifs performance comparison.** Using 351 ChIP-seq datasets from ENCODE [19], we compared CMF [12], DECOD [13], DREME [11], XXmotif [14] and SeAMotE performances; A) E-values and B) q-values associated with the 5 top-ranked motifs for CMF, DECOD, DREME, SeAMotE and XXmotif. C) Proportion of transcription factors for which annotated motifs were succesfully identified is plotted against the number of top-ranked motifs employed for the TOMTOM search [31].

**Figure 4**
**RNA-binding protein motifs performance comparison.** Using 13 CLIP-seq experiments available in the public domain [18], we compared DECOD [13], DREME [11], XXmotif [14] and SeAMotE performances. The ability to identify sequence elements that maximize the separation between positive and reference sets is reported for each motif identified using A) discrimination (Youden’s index) and B) significance (FisherŠs exact test). CMF [12] was excluded from the analysis because it does not allow motif discovery on a nucleic acid specific strand.

See this image and copyright information in PMC

Cited by

Mechanisms and consequences of subcellular RNA localization across diverse cell types.
Engel KL, Arora A, Goering R, Lo HG, Taliaferro JM. Engel KL, et al. Traffic. 2020 Jun;21(6):404-418. doi: 10.1111/tra.12730. Epub 2020 Apr 29. Traffic. 2020. PMID: 32291836 Free PMC article. Review.
Combining phylogenetic footprinting with motif models incorporating intra-motif dependencies.
Nettling M, Treutler H, Cerquides J, Grosse I. Nettling M, et al. BMC Bioinformatics. 2017 Mar 1;18(1):141. doi: 10.1186/s12859-017-1495-1. BMC Bioinformatics. 2017. PMID: 28249564 Free PMC article.
WSMD: weakly-supervised motif discovery in transcription factor ChIP-seq data.
Zhang H, Zhu L, Huang DS. Zhang H, et al. Sci Rep. 2017 Jun 12;7(1):3217. doi: 10.1038/s41598-017-03554-7. Sci Rep. 2017. PMID: 28607381 Free PMC article.
By the company they keep: interaction networks define the binding ability of transcription factors.
Cirillo D, Botta-Orfila T, Tartaglia GG. Cirillo D, et al. Nucleic Acids Res. 2015 Oct 30;43(19):e125. doi: 10.1093/nar/gkv607. Epub 2015 Jun 18. Nucleic Acids Res. 2015. PMID: 26089389 Free PMC article.
Zooming in on protein-RNA interactions: a multi-level workflow to identify interaction partners.
Colantoni A, Rupert J, Vandelli A, Tartaglia GG, Zacco E. Colantoni A, et al. Biochem Soc Trans. 2020 Aug 28;48(4):1529-1543. doi: 10.1042/BST20191059. Biochem Soc Trans. 2020. PMID: 32820806 Free PMC article. Review.

See all "Cited by" articles

References

1. Coulon A, Chow CC, Singer RH, Larson DR. Eukaryotic transcriptional dynamics: from single molecules to cell populations. Nat Rev Genet. 2013;14(8):572–584. doi: 10.1038/nrg3484. - DOI - PMC - PubMed
1. Janga SC. From specific to global analysis of posttranscriptional regulation in eukaryotes: posttranscriptional regulatory networks. Brief Funct Genomics. 2012;11(6):505–521. doi: 10.1093/bfgp/els046. - DOI - PubMed
1. Pichon X, Wilson LA, Stoneley M, Bastide A, King HA, Somers J, Willis AEE. RNA binding protein/RNA element interactions and the control of translation. Curr Protein Peptide Sci. 2012;13(4):294–304. doi: 10.2174/138920312801619475. - DOI - PMC - PubMed
1. Koboldt DC, Steinberg KM, Larson DE, Wilson RK, Mardis ER. The next-generation sequencing revolution and its impact on genomics. Cell. 2013;155(1):27–38. doi: 10.1016/j.cell.2013.09.006. - DOI - PMC - PubMed
1. Dassi E, Quattrone A. Tuning the engine: an introduction to resources on post-transcriptional regulation of gene expression. RNA Biol. 2012;9(10):1224–1232. doi: 10.4161/rna.22035. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

SeAMotE: a method for high-throughput motif discovery in nucleic acid sequences

Affiliation

SeAMotE: a method for high-throughput motif discovery in nucleic acid sequences

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources