dipwmsearch: a Python package for searching di-PWM motifs
- PMID: 37010504
- PMCID: PMC10081870
- DOI: 10.1093/bioinformatics/btad141
dipwmsearch: a Python package for searching di-PWM motifs
Abstract
Motivation: Seeking probabilistic motifs in a sequence is a common task to annotate putative transcription factor binding sites or other RNA/DNA binding sites. Useful motif representations include position weight matrices (PWMs), dinucleotide PWMs (di-PWMs), and hidden Markov models (HMMs). Dinucleotide PWMs not only combine the simplicity of PWMs-a matrix form and a cumulative scoring function-but also incorporate dependency between adjacent positions in the motif (unlike PWMs which disregard any dependency). For instance to represent binding sites, the HOCOMOCO database provides di-PWM motifs derived from experimental data. Currently, two programs, SPRy-SARUS and MOODS, can search for occurrences of di-PWMs in sequences.
Results: We propose a Python package called dipwmsearch, which provides an original and efficient algorithm for this task (it first enumerates matching words for the di-PWM, and then searches these all at once in the sequence, even if the latter contains IUPAC codes). The user benefits from an easy installation via Pypi or conda, a comprehensive documentation, and executable scripts that facilitate the use of di-PWMs.
Availability and implementation: dipwmsearch is available at https://pypi.org/project/dipwmsearch/ and https://gite.lirmm.fr/rivals/dipwmsearch/ under Cecill license.
© The Author(s) 2023. Published by Oxford University Press.
Figures

References
-
- Aho A, Corasick M.. Efficient string matching: an aid to bibliographic search. Commun ACM 1975;18:333–40.
-
- Korhonen JH, Palin K, Taipale J. et al. Fast motif matching revisited: high-order PWMs, SNPs and indels. Bioinformatics 2017;33:514–21. - PubMed
-
- Kulakovskiy I, Levitsky V, Oshchepkov D. et al. From binding motifs in chip-seq data to improved models of transcription factor binding sites. J Bioinform Comput Biol 2013;11:1340004. - PubMed