Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2001 Feb 1;29(3):774-82.
doi: 10.1093/nar/29.3.774.

Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes

Affiliations

Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes

L McCue et al. Nucleic Acids Res. .

Abstract

Toward the goal of identifying complete sets of transcription factor (TF)-binding sites in the genomes of several gamma proteobacteria, and hence describing their transcription regulatory networks, we present a phylogenetic footprinting method for identifying these sites. Probable transcription regulatory sites upstream of Escherichia coli genes were identified by cross-species comparison using an extended Gibbs sampling algorithm. Close examination of a study set of 184 genes with documented transcription regulatory sites revealed that when orthologous data were available from at least two other gamma proteobacterial species, 81% of our predictions corresponded with the documented sites, and 67% corresponded when data from only one other species were available. That the remaining predictions included bona fide TF-binding sites was proven by affinity purification of a putative transcription factor (YijC) bound to such a site upstream of the fabA gene. Predicted regulatory sites for 2097 E.coli genes are available at http://www.wadsworth.org/resnres/bioinfo/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
SDS–PAGE gel showing affinity purification of YijC. Escherichia coli MG1655 extracts, passed over a purH pre-column, were fractionated on DNA affinity columns carrying sequences predicted to be TF-binding sites upstream of the fabA, fabB or yqfA genes or a control column carrying a known FadR site upstream of fadB (see Materials and Methods). A silver stained SDS–PAGE gel of representative fractions eluted from the columns with 0.8 M NaCl is shown. M, molecular weight markers; lane 1, fabA column; lane 2, fabB column; lane 3, yqfA column; lane 4, fadB column. Mass spectrometry analysis identified the 26 kDa protein bound specifically to the fabA, fabB and yqfA columns as YijC and the protein bound to the fadB column as FadR.
Figure 2
Figure 2
Snapshot from the fabA Gene Page on our web site illustrating the data available. (A) At the top of each Gene Page are given the gene name as it appears in both the E.coli genome GenBank entry (27) and in EcoGene (28; http://bmb.med.miami.edu/EcoGene/EcoWeb/index.html), as is the name of the divergently transcribed gene when one exists. The species in which orthologs were detected for the gene are indicated (EC, E.coli; ST, Salmonella typhi; YP, Yersinia pestis; HI, Haemophilus influenzae; AA, Actinobacillus actinomycetemcomitans; VC, Vibrio cholerae; SP, Shewanella putrefaciens; PA, Pseudomonas aeruginosa; TF, Thiobacillus ferrooxidans). For those genes with a documented regulatory site(s), the reference(s), the genomic coordinates of the site(s) and the site type(s) are given. Information from up to three predictions (ordered by MAP value) are then described. For each prediction the species in which a site was predicted are indicated, as are the total number of sites and the number of sites in the E.coli data, followed by the MAP value of the motif. Links are provided to the motif model (B), represented as a sequence logo (29), and to two representations of the sites that were identified: a sequence logo (C) and a sequence alignment with site probabilities (D). The E.coli genomic coordinates of the site (an R indicates that the solution sequence given is the reverse complement of that in the GenBank entry), as well as the site sequence plus 5 flanking bp, are given. When a predicted site overlaps a previously documented site the site type (TF name or stem–loop) is indicated. If a predicted site overlaps an E.coli intergenic repeat (28), that is also reported. While analysis of the study set for correlation to documented TF-binding sites was confined to the most probable motif predictions, up to three predictions (ordered by MAP value) are described on our web site for each gene, since many genes are regulated by more than one transcription factor. The most probable motif (the YijC-binding site) detected in the fabA data is shown.
Figure 3
Figure 3
Distributions of the MAP values for the most probable motifs from the study set (183 data sets) and the full set (2097 data sets). (A) The distribution of MAP values for the full set compared to the study set, illustrating the shift to the left (toward lower MAPs) for the full set (see text) and indicating the relative number in the study set of genes that have experimentally identified sites compared to the full set. (B) The distribution of MAP values for the study set broken down according to the number of orthologs detected for each gene. (C) The distribution of MAP values for the full set broken down according to the number of orthologs detected for each gene. Comparison of (B) and (C) again illustrates the shift toward lower MAP values for the full set compared to the study set, as well as the observation that when data were available from only two species the predictions typically had lower MAP values.

References

    1. Perez-Rueda E. and Collado-Vides,J. (2000) The repertoire of DNA-binding transcriptional regulators in Escherichia coli K-12. Nucleic Acids Res., 28, 1838–1847. - PMC - PubMed
    1. Gralla J.D. and Collado-Vides,J. (1996) Organization and function of transcription regulatory elements. In Neidhardt,F.C. (ed.), Escherichia coli and Salmonella: Cellular and Molecular Biology. ASM Press, Washington, DC, pp. 1232–1245.
    1. Thieffry D., Salgado,H., Huerta,A.M. and Collado-Vides,J. (1998) Prediction of transcriptional regulatory sites in the complete genome sequence of Escherichia coli K-12. Bioinformatics, 14, 391–400. - PubMed
    1. Stormo G.D. and Hartzell,G.W. (1989) Identifying protein-binding sites from unaligned DNA fragments. Proc. Natl Acad. Sci. USA, 86, 1183–1187. - PMC - PubMed
    1. Lawrence C.E. and Reilly,A.A. (1990) An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins, 7, 41–51. - PubMed

Publication types