Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 20:10:e13772.
doi: 10.7717/peerj.13772. eCollection 2022.

Escherichia coli transcription factors of unknown function: sequence features and possible evolutionary relationships

Affiliations

Escherichia coli transcription factors of unknown function: sequence features and possible evolutionary relationships

Isabel Duarte-Velázquez et al. PeerJ. .

Abstract

Organisms need mechanisms to perceive the environment and respond accordingly to environmental changes or the presence of hazards. Transcription factors (TFs) are required for cells to respond to the environment by controlling the expression of genes needed. Escherichia coli has been the model bacterium for many decades, and still, there are features embedded in its genome that remain unstudied. To date, 58 TFs remain poorly characterized, although their binding sites have been experimentally determined. This study showed that these TFs have sequence variation at the third codon position G+C content but maintain the same Codon Adaptation Index (CAI) trend as annotated functional transcription factors. Most of these transcription factors are in areas of the genome where abundant repetitive and mobile elements are present. Sequence divergence points to groups with distinctive sequence signatures but maintaining the same type of DNA binding domain. Finally, the analysis of the promoter sequences of the 58 TFs showed A+T rich regions that agree with the features of horizontally transferred genes. The findings reported here pave the way for future research of these TFs that may uncover their role as spare factors in case of lose-of-function mutations in core TFs and trace back their evolutionary history.

Keywords: Escherichia coli; Mobile elements; Sequence codon bias; Structural features; Synteny; Transcription factors of unknown function.

PubMed Disclaimer

Conflict of interest statement

Bernardo Franco is an Academic Editor for PeerJ. Héctor Manuel Mora-Montes is an Academic Editor for PeerJ.

Figures

Figure 1
Figure 1. Transcription factors in bacteria are proteins mainly with two domains.
In (A), TFs activity depends on the location or accessibility to the target cis sequence, the binding of a partner protein, and the binding of ligands or covalent modifications such as phosphorylation. Once activated, most TFs dimerize and bind to target sequences in the vicinity of the core promoter sequence that either repress or activate transcription. In (B), an example of a typical TF shows the DNA-binding domain and the regulatory domain, in this case, a ligand-binding domain for hypoxanthine. The protein shown here is PurR, a LacI-family member (PDB accession number 2PUB) (Schumacher et al., 1994).
Figure 2
Figure 2. Sequence and structural features of 58 TFs of unknown function.
When comparing all transcription factors, homology is focused on the DNA binding domain. To prevent bias, sequence comparison was carried out using a guide tree. After five iterations using Clustal Omega, we identified clusters of proteins unrelated to them and clusters of closely related protein sequences. Identified ancestral nodes are indicated with a red arrow. AlphaFold2 models were used for structural comparison to find homology beyond the DNA binding domains using RaptorX (Källberg et al., 2012, 2014) to facilitate common structural cores. Names are placed just beside the protein on each alignment and indicated with an arrow. Refer to Fig. S1 for each protein predicted structure in the rotation as shown in the alignment for easier comparison. In the case of groups 1, 2, and 3, *indicates an overall comparison with the most outlier protein (YgfI) is presented in Fig. S2.
Figure 3
Figure 3. Structural features of TFs of unknown function contain strong similarities with bona fide TFs that determine the family classification.
(A) Structural alignments using mTM Align (Dong et al., 2018a, 2018b) show each transcription factor’s color code. The name color indicates the TF on the alignment. Plot (B) has the same alignments shown in (A) but highlights the common core in magenta on each set. Arrow indicates that the alignment was rotated to allow visualization of the common core. Name and AlphaFold2 database accession numbers indicate reference TFs for each family.
Figure 4
Figure 4. %G+C content and normalized CAI suggest a bias in TFs of unknown function.
(A) Comparison of %G+C at the third position against the normalized CAI values of 58 TFs of unknown function (blue dots) and annotated and functional TFs (orange dots) as described in the Methods section. Dashed lines were included to indicate the major cluster between annotated and functional TFs and those with unknown functions, excluding two outliers of annotated and functional TFs. With the clustering observed for each dataset, in (B), the normalized CAI was ordered from the lowest to the highest value (purple data points) and then plotted along with the %G+C content (red data points), indicating each TF. Horizontal dashed lines were used to indicate the limits of normalized CAI values for annotated and functional TFs to facilitate comparison with the TFs of unknown function. In the case of TFs of unknown function, the family that each one belongs is shown. The vertical dashed line indicates the separation of TFs of unknown function from those with known regulatory roles.
Figure 5
Figure 5. Transcriptional datamining of the 58 TFs of unknown function.
Heatmap of three microarray data covering the effect of RhyB expression and iron induction, E. coli adapted strains to 41.5 °C, and four different stressing conditions ranging cold, heat, oxidative and metabolic stress. Heatmap includes the clustering of the data using average linkage and Spearman Rank Correlation. For each experiment, the condition used is indicated at the bottom. Dashed lines separate each dataset. The family for each TF is shown in the color code displayed on the right. Black arrows indicate the position of three annotated and functional TFs (LysR, GntR and AraC).
Figure 6
Figure 6. The genomic landscape of the 58 TF analyzed.
In (A), G+C content and skew are shown, along with cryptic prophage and mobile elements in the complete genome. Only the TFs are shown in (B), indicating the family that each TF belongs to. ▴ Indicates the approximate regions of high transcription rate and ▴ indicates the lowest transcription regions according to Scholz et al. (2019). The list on the right of the figure suggests the highest to the lowest value for active regions (▴), and the repressed regions (▴) indicate the lowest to the highest of the repressed areas.
Figure 7
Figure 7. Global sequence comparison between the 58 TFs analyzed as annotated and functional or canonical TFs against the 58 TFs of unknown function.
Protein sequences were aligned using MUSCLE and then analyzed in Aligmentviewer. Pairwise identity 2D map was generated for each set; sequence order is given in File S4. In (A), a pairwise comparison between the 58 canonical TFs is provided. Clustering is observed for each family. In (B), the comparison between the 58 TFs of unknown function is provided. In (C) overall comparison between TFs of unknown function against TFs of known function. Scale indicates at 0 that 0% identity is found, and 1 indicates 100% identity.

Similar articles

Cited by

References

    1. Abdala DA, Ciria R, Merino E. GeConT 3: gene context analysis for orthologous proteins, conserved domains, and metabolic pathways. 2008. http://biocomputo.ibt.unam.mx:8080/GeConT/index.jsp http://biocomputo.ibt.unam.mx:8080/GeConT/index.jsp - PMC - PubMed
    1. Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, Mori H. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Molecular Systems Biology. 2006;2(1):2460. doi: 10.1038/msb4100050. - DOI - PMC - PubMed
    1. Babicki S, Arndt D, Marcu A, Liang Y, Grant JR, Maciejewski A, Wishart DS. Heatmapper: web-enabled heat mapping for all. Nucleic Acids Research. 2016;44(W1):W147–W153. doi: 10.1093/nar/gkw419. - DOI - PMC - PubMed
    1. Bailey TL, Gribskov M. Combining evidence using p-values: application to sequence homology searches. Bioinformatics. 1998;14(1):48–54. doi: 10.1093/bioinformatics/14.1.48. - DOI - PubMed
    1. Baumgart LA, Lee JE, Salamov A, Dilworth DJ, Na H, Mingay M, Blow MJ, Zhang Y, Yoshinaga Y, Daum CG, O’Malley RC. Persistence and plasticity in bacterial gene regulation. Nature Methods. 2021;18(12):1499–1505. doi: 10.1038/s41592-021-01312-2. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources