. 2022 Jul 20:10:e13772.

doi: 10.7717/peerj.13772. eCollection 2022.

Escherichia coli transcription factors of unknown function: sequence features and possible evolutionary relationships

Isabel Duarte-Velázquez¹, Javier de la Mora², Jorge Humberto Ramírez-Prado³, Alondra Aguillón-Bárcenas¹, Fátima Tornero-Gutiérrez¹, Eugenia Cordero-Loreto¹, Fernando Anaya-Velázquez¹, Itzel Páramo-Pérez¹, Ángeles Rangel-Serrano¹, Sergio Rodrigo Muñoz-Carranza¹, Oscar Eduardo Romero-González¹, Luis Rafael Cardoso-Reyes¹, Ricardo Alberto Rodríguez-Ojeda¹, Héctor Manuel Mora-Montes¹, Naurú Idalia Vargas-Maya¹, Felipe Padilla-Vaca¹, Bernardo Franco¹

Affiliations

¹ Biology, División de Ciencias Naturales y Exactas, Universidad de Guanajuato, Guanajuato, Guanajuato, México.
² Departamento de Genética Molecular, Instituto de Fisiología Celular, Universidad Nacional Autonoma de Mexico, Mexico City, México.
³ Unidad de Biotecnología, Centro de Investigación Científica de Yucatán, A. C., Mérida, Yucatán, Mexico.

PMID: 35880217
PMCID: PMC9308461
DOI: 10.7717/peerj.13772

Escherichia coli transcription factors of unknown function: sequence features and possible evolutionary relationships

Isabel Duarte-Velázquez et al. PeerJ. 2022.

. 2022 Jul 20:10:e13772.

doi: 10.7717/peerj.13772. eCollection 2022.

Authors

Affiliations

¹ Biology, División de Ciencias Naturales y Exactas, Universidad de Guanajuato, Guanajuato, Guanajuato, México.
² Departamento de Genética Molecular, Instituto de Fisiología Celular, Universidad Nacional Autonoma de Mexico, Mexico City, México.
³ Unidad de Biotecnología, Centro de Investigación Científica de Yucatán, A. C., Mérida, Yucatán, Mexico.

PMID: 35880217
PMCID: PMC9308461
DOI: 10.7717/peerj.13772

Abstract

Organisms need mechanisms to perceive the environment and respond accordingly to environmental changes or the presence of hazards. Transcription factors (TFs) are required for cells to respond to the environment by controlling the expression of genes needed. Escherichia coli has been the model bacterium for many decades, and still, there are features embedded in its genome that remain unstudied. To date, 58 TFs remain poorly characterized, although their binding sites have been experimentally determined. This study showed that these TFs have sequence variation at the third codon position G+C content but maintain the same Codon Adaptation Index (CAI) trend as annotated functional transcription factors. Most of these transcription factors are in areas of the genome where abundant repetitive and mobile elements are present. Sequence divergence points to groups with distinctive sequence signatures but maintaining the same type of DNA binding domain. Finally, the analysis of the promoter sequences of the 58 TFs showed A+T rich regions that agree with the features of horizontally transferred genes. The findings reported here pave the way for future research of these TFs that may uncover their role as spare factors in case of lose-of-function mutations in core TFs and trace back their evolutionary history.

Keywords: Escherichia coli; Mobile elements; Sequence codon bias; Structural features; Synteny; Transcription factors of unknown function.

PubMed Disclaimer

Conflict of interest statement

Bernardo Franco is an Academic Editor for PeerJ. Héctor Manuel Mora-Montes is an Academic Editor for PeerJ.

Figures

**Figure 1. Transcription factors in bacteria are proteins mainly with two domains.**
In (A), TFs activity depends on the location or accessibility to the target *cis* sequence, the binding of a partner protein, and the binding of ligands or covalent modifications such as phosphorylation. Once activated, most TFs dimerize and bind to target sequences in the vicinity of the core promoter sequence that either repress or activate transcription. In (B), an example of a typical TF shows the DNA-binding domain and the regulatory domain, in this case, a ligand-binding domain for hypoxanthine. The protein shown here is PurR, a LacI-family member (PDB accession number 2PUB) (Schumacher et al., 1994).

**Figure 2. Sequence and structural features of 58 TFs of unknown function.**
When comparing all transcription factors, homology is focused on the DNA binding domain. To prevent bias, sequence comparison was carried out using a guide tree. After five iterations using Clustal Omega, we identified clusters of proteins unrelated to them and clusters of closely related protein sequences. Identified ancestral nodes are indicated with a red arrow. AlphaFold2 models were used for structural comparison to find homology beyond the DNA binding domains using RaptorX (Källberg et al., 2012, 2014) to facilitate common structural cores. Names are placed just beside the protein on each alignment and indicated with an arrow. Refer to Fig. S1 for each protein predicted structure in the rotation as shown in the alignment for easier comparison. In the case of groups 1, 2, and 3, *indicates an overall comparison with the most outlier protein (YgfI) is presented in Fig. S2.

**Figure 3. Structural features of TFs of unknown function contain strong similarities with *bona fide* TFs that determine the family classification.**
(A) Structural alignments using mTM Align (Dong et al., 2018a, 2018b) show each transcription factor’s color code. The name color indicates the TF on the alignment. Plot (B) has the same alignments shown in (A) but highlights the common core in magenta on each set. Arrow indicates that the alignment was rotated to allow visualization of the common core. Name and AlphaFold2 database accession numbers indicate reference TFs for each family.

**Figure 4. %G+C content and normalized CAI suggest a bias in TFs of unknown function.**
(A) Comparison of %G+C at the third position against the normalized CAI values of 58 TFs of unknown function (blue dots) and annotated and functional TFs (orange dots) as described in the Methods section. Dashed lines were included to indicate the major cluster between annotated and functional TFs and those with unknown functions, excluding two outliers of annotated and functional TFs. With the clustering observed for each dataset, in (B), the normalized CAI was ordered from the lowest to the highest value (purple data points) and then plotted along with the %G+C content (red data points), indicating each TF. Horizontal dashed lines were used to indicate the limits of normalized CAI values for annotated and functional TFs to facilitate comparison with the TFs of unknown function. In the case of TFs of unknown function, the family that each one belongs is shown. The vertical dashed line indicates the separation of TFs of unknown function from those with known regulatory roles.

**Figure 5. Transcriptional datamining of the 58 TFs of unknown function.**
Heatmap of three microarray data covering the effect of RhyB expression and iron induction, *E. coli* adapted strains to 41.5 °C, and four different stressing conditions ranging cold, heat, oxidative and metabolic stress. Heatmap includes the clustering of the data using average linkage and Spearman Rank Correlation. For each experiment, the condition used is indicated at the bottom. Dashed lines separate each dataset. The family for each TF is shown in the color code displayed on the right. Black arrows indicate the position of three annotated and functional TFs (LysR, GntR and AraC).

**Figure 6. The genomic landscape of the 58 TF analyzed.**
In (A), G+C content and skew are shown, along with cryptic prophage and mobile elements in the complete genome. Only the TFs are shown in (B), indicating the family that each TF belongs to. ▴ Indicates the approximate regions of high transcription rate and ▴ indicates the lowest transcription regions according to Scholz et al. (2019). The list on the right of the figure suggests the highest to the lowest value for active regions (▴), and the repressed regions (▴) indicate the lowest to the highest of the repressed areas.

**Figure 7. Global sequence comparison between the 58 TFs analyzed as annotated and functional or canonical TFs against the 58 TFs of unknown function.**
Protein sequences were aligned using MUSCLE and then analyzed in Aligmentviewer. Pairwise identity 2D map was generated for each set; sequence order is given in File S4. In (A), a pairwise comparison between the 58 canonical TFs is provided. Clustering is observed for each family. In (B), the comparison between the 58 TFs of unknown function is provided. In (C) overall comparison between TFs of unknown function against TFs of known function. Scale indicates at 0 that 0% identity is found, and 1 indicates 100% identity.

See this image and copyright information in PMC

Cited by

RegulonDB v12.0: a comprehensive resource of transcriptional regulation in E. coli K-12.
Salgado H, Gama-Castro S, Lara P, Mejia-Almonte C, Alarcón-Carranza G, López-Almazo AG, Betancourt-Figueroa F, Peña-Loredo P, Alquicira-Hernández S, Ledezma-Tejeida D, Arizmendi-Zagal L, Mendez-Hernandez F, Diaz-Gomez AK, Ochoa-Praxedis E, Muñiz-Rascado LJ, García-Sotelo JS, Flores-Gallegos FA, Gómez L, Bonavides-Martínez C, Del Moral-Chávez VM, Hernández-Alvarez AJ, Santos-Zavaleta A, Capella-Gutierrez S, Gelpi JL, Collado-Vides J. Salgado H, et al. Nucleic Acids Res. 2024 Jan 5;52(D1):D255-D264. doi: 10.1093/nar/gkad1072. Nucleic Acids Res. 2024. PMID: 37971353 Free PMC article.
Mono- and multidomain defense toxins of the RelE/ParE superfamily.
Gerdes K. Gerdes K. mBio. 2025 Apr 9;16(4):e0025825. doi: 10.1128/mbio.00258-25. Epub 2025 Feb 25. mBio. 2025. PMID: 39998207 Free PMC article.
Transcriptional Regulation in Bacteria.
Shimada T. Shimada T. Microorganisms. 2024 Dec 6;12(12):2514. doi: 10.3390/microorganisms12122514. Microorganisms. 2024. PMID: 39770717 Free PMC article.
Proteomics and metabolic burden analysis to understand the impact of recombinant protein production in E. coli.
Rajacharya GH, Sharma A, Yazdani SS. Rajacharya GH, et al. Sci Rep. 2024 May 28;14(1):12271. doi: 10.1038/s41598-024-63148-y. Sci Rep. 2024. PMID: 38806637 Free PMC article.
Phylogeny and structural modeling of the transcription factor CsqR (YihW) from Escherichia coli.
Rybina AA, Glushak RA, Bessonova TA, Dakhnovets AI, Rudenko AY, Ozhiganov RM, Kaznadzey AD, Tutukina MN, Gelfand MS. Rybina AA, et al. Sci Rep. 2024 Apr 3;14(1):7852. doi: 10.1038/s41598-024-58492-y. Sci Rep. 2024. PMID: 38570624 Free PMC article.

See all "Cited by" articles

References

1. Abdala DA, Ciria R, Merino E. GeConT 3: gene context analysis for orthologous proteins, conserved domains, and metabolic pathways. 2008. http://biocomputo.ibt.unam.mx:8080/GeConT/index.jsp http://biocomputo.ibt.unam.mx:8080/GeConT/index.jsp - PMC - PubMed
1. Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, Mori H. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Molecular Systems Biology. 2006;2(1):2460. doi: 10.1038/msb4100050. - DOI - PMC - PubMed
1. Babicki S, Arndt D, Marcu A, Liang Y, Grant JR, Maciejewski A, Wishart DS. Heatmapper: web-enabled heat mapping for all. Nucleic Acids Research. 2016;44(W1):W147–W153. doi: 10.1093/nar/gkw419. - DOI - PMC - PubMed
1. Bailey TL, Gribskov M. Combining evidence using p-values: application to sequence homology searches. Bioinformatics. 1998;14(1):48–54. doi: 10.1093/bioinformatics/14.1.48. - DOI - PubMed
1. Baumgart LA, Lee JE, Salamov A, Dilworth DJ, Na H, Mingay M, Blow MJ, Zhang Y, Yoshinaga Y, Daum CG, O’Malley RC. Persistence and plasticity in bacterial gene regulation. Nature Methods. 2021;18(12):1499–1505. doi: 10.1038/s41592-021-01312-2. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- BioCyc

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Escherichia coli transcription factors of unknown function: sequence features and possible evolutionary relationships

Affiliations

Escherichia coli transcription factors of unknown function: sequence features and possible evolutionary relationships

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Molecular Biology Databases