Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Oct 23;15(1):925.
doi: 10.1186/1471-2164-15-925.

SeAMotE: a method for high-throughput motif discovery in nucleic acid sequences

Affiliations

SeAMotE: a method for high-throughput motif discovery in nucleic acid sequences

Federico Agostini et al. BMC Genomics. .

Abstract

Background: The large amount of data produced by high-throughput sequencing poses new computational challenges. In the last decade, several tools have been developed for the identification of transcription and splicing factor binding sites.

Results: Here, we introduce the SeAMotE (Sequence Analysis of Motifs Enrichment) algorithm for discovery of regulatory regions in nucleic acid sequences. SeAMotE provides (i) a robust analysis of high-throughput sequence sets, (ii) a motif search based on pattern occurrences and (iii) an easy-to-use web-server interface. We applied our method to recently published data including 351 chromatin immunoprecipitation (ChIP) and 13 crosslinking immunoprecipitation (CLIP) experiments and compared our results with those of other well-established motif discovery tools. SeAMotE shows an average accuracy of 80% in finding discriminative motifs and outperforms other methods available in literature.

Conclusions: SeAMotE is a fast, accurate and flexible algorithm for the identification of sequence patterns involved in protein-DNA and protein-RNA recognition. The server can be freely accessed at http://s.tartaglialab.com/new_submission/seamote.

PubMed Disclaimer

Figures

Figure 1
Figure 1
SeAMotE workflow. Illustration of the method pipeline: red boxes indicate the coverage calculation and seed extension loop; dashed arrows and the blue box represent conditional steps that depend on the user-definable variables, such as providing or selecting a specific background set or filtering out patterns that are closely related.
Figure 2
Figure 2
SeAMotE output summary. Example of output table showing the list of motifs (IUPAC and RegEx) that better discriminate the input sets along with their logo representation and positional weighted matrix download button, positive and reference coverage (as percentage of sequences containing at least one pattern occurrence), discrimination (Youden’s index) and p-value (Fisher’s exact test). By clicking on the logo, it is possible to retrieve the image file (png format) of the associated motif.
Figure 3
Figure 3
Annotated motifs performance comparison. Using 351 ChIP-seq datasets from ENCODE [19], we compared CMF [12], DECOD [13], DREME [11], XXmotif [14] and SeAMotE performances; A) E-values and B) q-values associated with the 5 top-ranked motifs for CMF, DECOD, DREME, SeAMotE and XXmotif. C) Proportion of transcription factors for which annotated motifs were succesfully identified is plotted against the number of top-ranked motifs employed for the TOMTOM search [31].
Figure 4
Figure 4
RNA-binding protein motifs performance comparison. Using 13 CLIP-seq experiments available in the public domain [18], we compared DECOD [13], DREME [11], XXmotif [14] and SeAMotE performances. The ability to identify sequence elements that maximize the separation between positive and reference sets is reported for each motif identified using A) discrimination (Youden’s index) and B) significance (FisherŠs exact test). CMF [12] was excluded from the analysis because it does not allow motif discovery on a nucleic acid specific strand.

Similar articles

Cited by

References

    1. Coulon A, Chow CC, Singer RH, Larson DR. Eukaryotic transcriptional dynamics: from single molecules to cell populations. Nat Rev Genet. 2013;14(8):572–584. doi: 10.1038/nrg3484. - DOI - PMC - PubMed
    1. Janga SC. From specific to global analysis of posttranscriptional regulation in eukaryotes: posttranscriptional regulatory networks. Brief Funct Genomics. 2012;11(6):505–521. doi: 10.1093/bfgp/els046. - DOI - PubMed
    1. Pichon X, Wilson LA, Stoneley M, Bastide A, King HA, Somers J, Willis AEE. RNA binding protein/RNA element interactions and the control of translation. Curr Protein Peptide Sci. 2012;13(4):294–304. doi: 10.2174/138920312801619475. - DOI - PMC - PubMed
    1. Koboldt DC, Steinberg KM, Larson DE, Wilson RK, Mardis ER. The next-generation sequencing revolution and its impact on genomics. Cell. 2013;155(1):27–38. doi: 10.1016/j.cell.2013.09.006. - DOI - PMC - PubMed
    1. Dassi E, Quattrone A. Tuning the engine: an introduction to resources on post-transcriptional regulation of gene expression. RNA Biol. 2012;9(10):1224–1232. doi: 10.4161/rna.22035. - DOI - PMC - PubMed

Publication types

LinkOut - more resources