Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2006 Oct 2:7:423.
doi: 10.1186/1471-2105-7-423.

Detection of prokaryotic promoters from the genomic distribution of hexanucleotide pairs

Affiliations
Comparative Study

Detection of prokaryotic promoters from the genomic distribution of hexanucleotide pairs

Pierre-Etienne Jacques et al. BMC Bioinformatics. .

Abstract

Background: In bacteria, sigma factors and other transcriptional regulatory proteins recognize DNA patterns upstream of their target genes and interact with RNA polymerase to control transcription. As a consequence of evolution, DNA sequences recognized by transcription factors are thought to be enriched in intergenic regions (IRs) and depleted from coding regions of prokaryotic genomes.

Results: In this work, we report that genomic distribution of transcription factors binding sites is biased towards IRs, and that this bias is conserved amongst bacterial species. We further take advantage of this observation to develop an algorithm that can efficiently identify promoter boxes by a distribution-dependent approach rather than a direct sequence comparison approach. This strategy, which can easily be combined with other methodologies, allowed the identification of promoter sequences in ten species and can be used with any annotated bacterial genome, with results that rival with current methodologies. Experimental validations of predicted promoters also support our approach.

Conclusion: Considering that complete genomic sequences of over 1000 bacteria will soon be available and that little transcriptional information is available for most of them, our algorithm constitutes a promising tool for the prediction of promoter sequences. Importantly, our methodology could also be adapted to identify DNA sequences recognized by other regulatory proteins.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Genome-derived distribution matrices generated for consensus sequences from transcription factors of three different organisms. (A) E. coli, (B) B. subtilis, and (C) M. tuberculosis. The name of the transcription factor is identified above each matrix. The mismatch number of each cell is indicated on both sides of matrices. The analyzed consensus sequence is shown under each matrix, along with the allowed spacing range. The first row corresponds to principal σ factors, the second row to alternative σ factors, and the third row to transcriptional regulators.
Figure 2
Figure 2
Specific examples of genome-derived distribution matrices generated for characterized principal σ factor-dependent promoter sequences and non-regulatory sequences from three different organisms. (A) E. coli, (B) B. subtilis, and (C) M. tuberculosis. Gene names and experimentally identified promoter sequences, as reported in the literature, are indicated above each matrix. The first row corresponds to characterized promoter sequences closely resembling to the proposed consensus. The second row presents experimentally identified promoters containing more mismatches relative to the proposed consensus. The last row shows distribution matrices of hexanucleotide pairs with approximately 3 mismatches position per box, which were extracted from the middle of the rpoB coding sequence (bona fide non-promoter sequences).
Figure 3
Figure 3
Graphs of enlarged IRs containing characterized promoter sequences presented in Figure 2. The green arrow represents the characterized promoter sequence. The start codon of the gene of interest is located at "0" on the X-axis of the enlarged IR. The Y-axis coordinate shows the calculated score. The threshold of each region is shown (dashed grey line). The sequence of all candidate promoters above the threshold is shown (5 merged overlapping -35 boxes from the different allowed spacings along with the shared -10 box). (A) The general synthetic matrix (#45012859) used to calculate the scores presented in the graphs. The name of the gene located downstream of the selected enlarged IR is indicated in each graph. E. coli (B), B. subtilis (C) and M. tuberculosis (D). Scores obtained for the full E. coli rpoB coding sequence were also plotted.
Figure 4
Figure 4
The application of a simple sequence-dependent filter and the subsequent diminution of the threshold allows the detection of an otherwise missed promoter. Graphs reporting scores calculated for all hexanucleotide pairs located in the enlarged IR upstream of the B. subtilis ftsA gene. (A) No filter applied. (B) Sequence-dependent filtering (3 mismatches and 2 mismatches allowed respectively in the -35 and -10 boxes relative to the B. subtilis principal σ factor consensus sequence) followed by the reduction of the region-specific threshold from 3 to 2 standard deviations above of the mean.

Similar articles

Cited by

References

    1. Gross CA, Chan C, Dombroski A, Gruber T, Sharp M, Tupy J, Young B. The functional and regulatory roles of sigma factors in transcription. Cold Spring Harb Symp Quant Biol. 1998;63:141–55. doi: 10.1101/sqb.1998.63.141. - DOI - PubMed
    1. Gruber TM, Gross CA. Multiple sigma subunits and the partitioning of bacterial transcription space. Annu Rev Microbiol. 2003;57:441–66. doi: 10.1146/annurev.micro.57.030502.090913. - DOI - PubMed
    1. Murakami KS, Masuda S, Campbell EA, Muzzin O, Darst SA. Structural basis of transcription initiation: an RNA polymerase holoenzyme-DNA complex. Science. 2002;296:1285–90. doi: 10.1126/science.1069595. - DOI - PubMed
    1. Dombroski AJ, Johnson BD, Lonetto M, Gross CA. The sigma subunit of Escherichia coli RNA polymerase senses promoter spacing. Proc Natl Acad Sci U S A. 1996;93:8858–62. doi: 10.1073/pnas.93.17.8858. - DOI - PMC - PubMed
    1. Helmann JD. Compilation and analysis of Bacillus subtilis sigma A-dependent promoter sequences: evidence for extended contact between RNA polymerase and upstream promoter DNA. Nucleic Acids Res. 1995;23:2351–60. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources