Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(7):e40373.
doi: 10.1371/journal.pone.0040373. Epub 2012 Jul 5.

POWRS: position-sensitive motif discovery

Affiliations

POWRS: position-sensitive motif discovery

Ian W Davis et al. PLoS One. 2012.

Abstract

Transcription factors and the short, often degenerate DNA sequences they recognize are central regulators of gene expression, but their regulatory code is challenging to dissect experimentally. Thus, computational approaches have long been used to identify putative regulatory elements from the patterns in promoter sequences. Here we present a new algorithm "POWRS" (POsition-sensitive WoRd Set) for identifying regulatory sequence motifs, specifically developed to address two common shortcomings of existing algorithms. First, POWRS uses the position-specific enrichment of regulatory elements near transcription start sites to significantly increase sensitivity, while providing new information about the preferred localization of those elements. Second, POWRS forgoes position weight matrices for a discrete motif representation that appears more resistant to over-generalization. We apply this algorithm to discover sequences related to constitutive, high-level gene expression in the model plant Arabidopsis thaliana, and then experimentally validate the importance of those elements by systematically mutating two endogenous promoters and measuring the effect on gene expression levels. This provides a foundation for future efforts to rationally engineer gene expression in plants, a problem of great importance in developing biotech crop varieties.

Availability: BSD-licensed Python code at http://grassrootsbio.com/papers/powrs/.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: IWD, CB, PNB, and TE are employed by GrassRoots Biotechnology, a private for-profit company, and are paid salaries by and/or hold equity interests in the same; PNB is a member of the board. Technology described herein (POWRS) is used by GrassRoots in research and development activities that may lead to products and/or patents; however, GrassRoots has filed no patents covering POWRS and is making the software available under a permissive open-source license (BSD license). None of the foregoing alters the authors’ adherence to all the PLoS ONE policies on sharing data and materials.

Figures

Figure 1
Figure 1. Graphical depiction of Site II motif matches in Arabidopsis.
Smoothed histogram (kernel density estimate) of occurrences of the Site II motif in Arabidopsis promoters from the 118 constitutive genes of interest (solid line) or background genes (dashed line). The Site II motif is as defined in Table 2. Units of motif density are occurrences per base pair per sequence. POWRS reports maximal enrichment of Site II in the genes of interest relative to the background in the region from −150 to +25, in excellent agreement with what is seen here. Note that although Site II occurs more often near the TSS for all genes, the effect is significantly stronger among the genes of interest.
Figure 2
Figure 2. Transversion scheme in GR2A and GR11A.
Endogenous sequence is shown in black, sequence after transversion is shown above in gray. Transcription starts sites annotated by TAIR9 and inferred from EST data are indicated. Blocks for transversion are numbered and delimited by spaces. Natural Site II and telo box motifs are marked on the endogenous sequence in green and yellow respectively. Non-natural Site II and telo box motifs created by the transversions are marked on the transversion sequence; in some cases, these are split between natural and mutated sequences. Blocks whose transversion clearly disrupted promoter activity are numbered in red (compare to Figure 3).
Figure 3
Figure 3. Transversion results for GR2A and GR11A.
Mean and standard error of GFP expression driven by 10 bp transversion mutants of endogenous promoters GR2A and GR11A. Stable transgenic plants from 4–6 independent events per line were assayed by qRT-PCR and corrected for copy number.

Similar articles

Cited by

References

    1. Das MK, Dai HK. A survey of DNA motif finding algorithms. BMC Bioinformatics. 2007;8:S21. - PMC - PubMed
    1. Sun HQ, Low MY, Hsu WJ, Tan CW, Rajapakse JC. Tree-structured algorithm for long weak motif discovery. Bioinformatics. 2011;27:2641–2647. - PubMed
    1. Linhart C, Halperin Y, Shamir R. Transcription factor and microRNA motif discovery: the Amadeus platform and a compendium of metazoan target sets. Genome Res. 2008;18:1180–1189. - PMC - PubMed
    1. Georgiev S, Boyle AP, Jayasurya K, Ding X, Mukherjee S, et al. Evidence-ranked motif identification. Genome Biol. 2010;11:R19. - PMC - PubMed
    1. Bailey TL, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol. 1994;2:28–36. - PubMed

Publication types