Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 Mar 27:9:145.
doi: 10.1186/1471-2164-9-145.

A generic approach to identify Transcription Factor-specific operator motifs; Inferences for LacI-family mediated regulation in Lactobacillus plantarum WCFS1

Affiliations
Comparative Study

A generic approach to identify Transcription Factor-specific operator motifs; Inferences for LacI-family mediated regulation in Lactobacillus plantarum WCFS1

Christof Francke et al. BMC Genomics. .

Abstract

Background: A key problem in the sequence-based reconstruction of regulatory networks in bacteria is the lack of specificity in operator predictions. The problem is especially prominent in the identification of transcription factor (TF) specific binding sites. More in particular, homologous TFs are abundant and, as they are structurally very similar, it proves difficult to distinguish the related operators by automated means. This also holds for the LacI-family, a family of TFs that is well-studied and has many members that fulfill crucial roles in the control of carbohydrate catabolism in bacteria including catabolite repression. To overcome the specificity problem, a comprehensive footprinting approach was formulated to identify TF-specific operator motifs and was applied to the LacI-family of TFs in the model gram positive organism, Lactobacillus plantarum WCFS1. The main premise behind the approach is that only orthologous sequences that share orthologous genomic context will share equivalent regulatory sites.

Results: When the approach was applied to the 12 LacI-family TFs of the model species, a specific operator motif was identified for each of them. With the TF-specific operator motifs, potential binding sites were found on the genome and putative minimal regulons could be defined. Moreover, specific inducers could in most cases be linked to the TFs through phylogeny, thereby unveiling the biological role of these regulons. The operator predictions indicated that the LacI-family TFs can be separated into two subfamilies with clearly distinct operator motifs. They also established that the operator related to the 'global' regulator CcpA is not inherently distinct from that of other LacI-family members, only more degenerate. Analysis of the chromosomal position of the identified putative binding sites confirmed that the LacI-family TFs are mostly auto-regulatory and relate mainly to carbohydrate uptake and catabolism.

Conclusion: Our approach to identify specific operator motifs for different TF-family members is specific and in essence generic. The data infer that, although the specific operator motifs can be used to identify minimal regulons, experimental knowledge on TF activity especially is essential to determine complete regulons as well as to estimate the overlap between TF affinities.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The number of LacI-family TF homologs and the presence of Lactobacillus plantarum orthologs in different Firmicutes. The organisms are organized on basis of their phylogeny (left; inferred from phosphoglycerate kinase amino acid sequence data [98]) and the TFs on basis of the NJ-tree of the L. plantarum LacI-family TFs (top). The presence of an ortholog to the L. plantarum proteins is indicated by open (different cluster in the NJ-tree) and closed circles (same cluster in the NJ-tree). The members of the various L. plantarum GOOFEs are colored. Some orthologs have been experimentally characterized and are indicated by '+'. remark: Although the PFAM HMM that is used to identify the LacI-domain represents only a small part of the DNA-binding domain, in most instances there was complete correspondence between the number of LacI-family TFs identified by us and the number listed by PFAM [96]. However, there were a few exceptions and in these cases the number given by PFAM appeared erroneous [see Additional file 1]. In some cases the PFAM database was just incomplete (e.g. Pediococcus pentosaceus and Leuconostoc mesenteroides). In other cases sequences were counted twice as a result of double Uniprot entries (e.g. for CcpA in L. plantarum). Other proteins were missing in the PFAM database because of mistakes in the ORF definition.
Figure 2
Figure 2
Left panel: Sequence motifs of predicted LacI-family TF specific operators in L. plantarum. Right panel: The protein sequence motif of the DNA-binding region of the LacI-family TFs per GOOFE. The numbering of the protein residues deviates slightly from that in the various crystal structures. This relates to the fact that the alignment includes some gaps that are necessary to accommodate all the LacI protein sequences that have been compared by us. The visualization of the sequences was created using Weblogo [99]. remark: NMR studies have shown that the hinge helix plays an important role in kinking the DNA whilst forming an alpha-helix (helix 4) and thereby stabilizing the induced fit of the recognition helix within the major groove of the operator [33,81]. In fact, the 3D-structures of operator-bound CcpA and LacI implicate many residues of helix 3 and 4 in the contact of the TF with the operator [56,57]. Moreover, the 3D-structures indicate that in both CcpA and LacI the same residues are involved. The DNA-protein contacts are indicated with triangles. The blue triangles mark the residues interacting with the phosphate backbone and the red triangles mark the residues interacting directly with a nucleotide (the position of the nucleotide is indicated in a box). In the case of Lp_0188 (SacR), a well-conserved guanine and corresponding cytidine are found at positions -7 and 7 of the operator, respectively. This suggests that the operator recognized by Lp_0188 (SacR) and its orthologs, is two nucleotides wider than that recognized by other 'CcpA-like' LacI-TFs. The 'EbgR-like' LacI-family TFs carry a conserved insertion before helix 3 and seem to lack the characteristic conserved alanine and leucine (or methionine in the case of RbsR) at position 60 of the hinge helix in the 'CcpA-like' LacI-family TFs. The absence of these residues coincides perfectly with the absence of the central CG nucleotide pair in the predicted Lp_3470 (LacR), Lp_3479 (GalR) and Lp_3488 (RafR) operators.
Figure 3
Figure 3
The L. plantarum operons predicted to be controlled by (a) 'CcpA-like' LacI-family TFs and (b) 'EbgR-like' LacI-family TFs. The set of operons is restricted to those having a very high probability of being correctly predicted. The positions of putative operators are marked by triangles and the direction in which transcription is presumably regulated is indicated (< and >). The functional categories of the proteins encoded by the genes that are under the control of LacI-family TFs are color-coded as depicted in the inset. The functional annotations were taken from the in-house annotation database of L. plantarum WCFS1 ([38] and C. Francke unpublished results). See [Additional file 9] for a detailed description of the functional annotation.
Figure 4
Figure 4
A reduced NJ-tree of the inducer-binding domain for the LacI-family TF homologs of L. plantarum. The sequences of LacI-family TF homologs with known inducer from other organisms were added for comparison [21, 34, 45, 50-53, 64, 67-69, 84, 100-109]. Orthology is indicated by color-coding. The numbers accompanying the clusters in the NJ-tree represent the bootstrap support for the individual divisions (out of 1000).
Figure 5
Figure 5
Operators present in the neighborhood of the genes encoding the LacI-family TFs of the 'CcpA-like' subfamily. For most TFs an alignment of the upstream region is shown for the sequences related to one GOOFE. In the case of CcpA, CcpB and Lp_0172 (MalR), no proper alignment could be made with regions from other organisms. Potential CREs are indicated by orange bordered boxes and the LacI-family TF specific operators are indicated by differently colored boxes. The -35/-10 regions of the putative promoters are underlined in purple and pink, respectively. The translation start is positioned at the right end and is indicated in green, as is the putative ribosome binding site.
Figure 6
Figure 6
The relative absolute levels of LacI-family TF related mRNAs in various microarray experiments with L. plantarum. The related array data were used before by [17]. Information on the determination of these levels can be found in the Materials and methods.
Figure 7
Figure 7
The TF-specific operator motif identification workflow. 1) First a particular TF-family was selected and 2) a prominent representative of that family was chosen. 3) The related sequence was used to search the genome of a particular species for intra-species homologs. This search was iterated until no new sequences are recovered. A high e-value cut-off was employed to ensure the recovery of all homologs. The sequences were aligned and a NJ-tree was generated. Both the alignment and the NJ-tree were used to determine the family or sub-family boundaries. 4) The procedure was repeated to retrieve all inter-species homologs and the general features of the intra-species homologs were used to determine the sequences that were taken into consideration. Orthologous relations between sequences were established on basis of clustering in the NJ-tree and a sufficient bootstrap support (in green) for the clustering. In the case of Lp_0172 and Lp_0173 the orthologous clusters are color-coded in brown and orange, respectively, and the other TFs of L. plantarum are indicated in red. 5) The genomic context of the various orthologs was inspected (legend bottom left) and in case clear differences existed, the orthologous groups were sub-divided into different Groups of orthologous functional equivalents (GOOFEs), as illustrated. Then, upstream regions of the conserved gene(s) in context were selected and inspected for potential regulatory sequences (the selected regions are indicated by colored triangles). The potential regulatory sequences were compared and those that showed similar features were selected. In fact, only those sequences that showed the highest conservation were selected to determine a specific operator motif. In the case of Lp_0172 and Lp_0173, a 'CcpA-like' operator motif was found up to 3 times in the upstream regions. The sequences that were selected to determine the Lp_0172 and Lp_0173 specific operator motifs are displayed (Px indicates the relative position of the selected sequence with respect to other similar sequences and relative to the translation start). 6) The selected sequences were used to create a GOOFE specific operator motif. The thus identified specific motifs related to the orthologous groups containing Lp_0172 and Lp_0173 demonstrate that the division into GOOFEs was essential to arrive at highly specific operator motifs. Although the motifs within both orthologous groups are highly similar, they differ distinctly in one position depending on the GOOFE. In the case of the TFs orthologous to Lp_0172, the motifs are strikingly different at position +5, with a fully conserved guanine in the GOOFE containing Lp_0172 and a fully conserved thymidine in the other. And in the case of the TFs orthologous to Lp_0173 the motifs are strikingly different at position -5, with a fully conserved thymidine in the GOOFE containing Lp_0173 and a fully conserved adenine in the other. remark: The gene/protein identifiers in the figure are derived from the ERGO resource [86]. A conversion to other identifiers can be found in [Additional file 2]. The functional annotation of the depicted genes were taken from the in-house annotation database of L. plantarum WCFS1 ([38] and C. Francke unpublished results) and the ERGO resource. See [Additional file 9] for a detailed description of the functional annotation in L. plantarum.

Similar articles

Cited by

References

    1. Stormo GD. DNA binding sites: representation and discovery. Bioinformatics. 2000;16:16–23. doi: 10.1093/bioinformatics/16.1.16. - DOI - PubMed
    1. Bulyk ML. Computational prediction of transcription-factor binding site locations. Genome Biol. 2003;5:201. doi: 10.1186/gb-2003-5-1-201. - DOI - PMC - PubMed
    1. Thompson W, Rouchka EC, Lawrence CE. Gibbs Recursive Sampler: finding transcription factor binding sites. Nucleic Acids Res. 2003;31:3580–3585. doi: 10.1093/nar/gkg608. - DOI - PMC - PubMed
    1. Kim JT, Gewehr JE, Martinetz T. Binding matrix: a novel approach for binding site recognition. J Bioinform Comput Biol. 2004;2:289–307. doi: 10.1142/S0219720004000569. - DOI - PubMed
    1. Osada R, Zaslavsky E, Singh M. Comparative analysis of methods for representing and searching for transcription factor binding sites. Bioinformatics. 2004;20:3516–3525. doi: 10.1093/bioinformatics/bth438. - DOI - PubMed

Publication types

LinkOut - more resources