. 2024 Mar 5:15:1353553.

doi: 10.3389/fgene.2024.1353553. eCollection 2024.

Flexible gold standards for transcription factor regulatory interactions in Escherichia coli K-12: architecture of evidence types

Affiliations

¹ Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad S/N, Cuernavaca, Mexico.
² Department of Biomedical Engineering, Boston University, Boston, MA, United States.
³ Center for Genomic Regulation, The Barcelona Institute of Science and Technology, Universitat Pompeu Fabra, Barcelona, Spain.

^# Contributed equally.

PMID: 38505828
PMCID: PMC10949920
DOI: 10.3389/fgene.2024.1353553

Flexible gold standards for transcription factor regulatory interactions in Escherichia coli K-12: architecture of evidence types

Paloma Lara et al. Front Genet. 2024.

. 2024 Mar 5:15:1353553.

doi: 10.3389/fgene.2024.1353553. eCollection 2024.

Authors

Affiliations

¹ Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad S/N, Cuernavaca, Mexico.
² Department of Biomedical Engineering, Boston University, Boston, MA, United States.
³ Center for Genomic Regulation, The Barcelona Institute of Science and Technology, Universitat Pompeu Fabra, Barcelona, Spain.

^# Contributed equally.

PMID: 38505828
PMCID: PMC10949920
DOI: 10.3389/fgene.2024.1353553

Abstract

Post-genomic implementations have expanded the experimental strategies to identify elements involved in the regulation of transcription initiation. Here, we present for the first time a detailed analysis of the sources of knowledge supporting the collection of transcriptional regulatory interactions (RIs) of Escherichia coli K-12. An RI groups the transcription factor, its effect (positive or negative) and the regulated target, a promoter, a gene or transcription unit. We improved the evidence codes so that specific methods are incorporated and classified into independent groups. On this basis we updated the computation of confidence levels, weak, strong, or confirmed, for the collection of RIs. These updates enabled us to map the RI set to the current collection of HT TF-binding datasets from ChIP-seq, ChIP-exo, gSELEX and DAP-seq in RegulonDB, enriching in this way the evidence of close to one-quarter (1329) of RIs from the current total 5446 RIs. Based on the new computational capabilities of our improved annotation of evidence sources, we can now analyze the internal architecture of evidence, their categories (experimental, classical, HT, computational), and confidence levels. This is how we know that the joint contribution of HT and computational methods increase the overall fraction of reliable RIs (the sum of confirmed and strong evidence) from 49% to 71%. Thus, the current collection has 3912 reliable RIs, with 2718 or 70% of them with classical evidence which can be used to benchmark novel HT methods. Users can selectively exclude the method they want to benchmark, or keep for instance only the confirmed interactions. The recovery of regulatory sites in RegulonDB by the different HT methods ranges between 33% by ChIP-exo to 76% by ChIP-seq although as discussed, many potential confounding factors limit their interpretation. The collection of improvements reported here provides a solid foundation to incorporate new methods and data, and to further integrate the diverse sources of knowledge of the different components of the transcriptional regulatory network. There is no other genomic database that offers this comprehensive high-quality architecture of knowledge supporting a corpus of transcriptional regulatory interactions.

Keywords: E. coli; RegulonDB; confidence levels; evidence codes; high-throughput genomic methodologies; regulatory interactions; source of knowledge.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
Evidence types for RIs. Any RI requires evidence for the binding of the TF, together with functional evidence showing its regulatory effect in transcriptional activity. Evidence types are grouped in three major categories (classical, HT and nonexperimental), each specific group contains methods that are not considered independent, whereas methods of different groups are considered independent. The algebra of their independent groupings is limited to binding evidence types, which define the level of confidence as discussed in the main text. Note that the “others nonexperimental” evidence is not used to assign confidence.

**FIGURE 2**
Number of TFs with data in RegulonDB and number of TFBSs available HT datasets. Number of TFs for which we have either dataset for a single or for multiple HT methods. This is the reduced set of TFs for which RegulonDB has at least one regulatory binding site. The TFs names are listed in Supplementary Data Sheet S2. Below the method is the total number of TFs and the number of RIs in RegulonDB for those TFs, that is, the set of candidate RIs that can be matched by each method.

**FIGURE 3**
RI distribution analysis by type of RI (TF-promoter, TF-TU or TF-gene), confidence level (C: confirmed, S: strong, and W: weak), and evidence category (classical, HT and nonexperimental) **(A)** Number of RIs by type of RI for each confidence level; **(B)** Number of RIs by evidence category for each type of RI; **(C)** Number of RIs by confidence level for each evidence category. For simplification, RIs were classified within all experimental categories irrespective of whether additionally they also have computational evidence. Thus, for instance, an RI with both computational and classical evidence is counted as “classical”.

**FIGURE 4**
Detailed combinations of binding evidence supporting RIs, shown as intersecting sets using an upset plot. In each plot, the bottom left bars represent the different types of evidence curated and the respective number of RIs they support. The top bars represent the number of RIs supported by each possible combination of 2 or more evidence types. The three plots show the combinations of evidence supporting the current set of **(A)** “Confirmed” RIs, **(B)** “Strong” RIs and **(C)** “Weak” RIs, respectively.

**FIGURE 5**
Distributions of confidence level from the RI set when excluding HT and/or computational evidence. Footnote: For the current RI collection, this figure shows the profile in confidence levels as different methods are excluded in order to detect their contribution. The reference is the first column with all evidence types included showing the highest fraction of strong and confirmed levels, and a smaller weak component. For instance, when comparing with bar C, we can quantify how much HT methods are contributing to the confidence levels.

**FIGURE 6**
Recovery average of classical TFRSs by different HT-binding methodologies. For each methodology, the fraction of recovered TFRS sites in RegulonDB was estimated and the average for all TFs for each method and std deviation is shown. The panel **(A)** shows the results using variable peak sizes based on data as reported by authors, and panel **(B)** shows the results using a peak size of 200 for all HT TF-binding datasets. The set of TFs is specific to each method given the currently available datasets gathered in RegulonDB version 12.2 and also limited to those TFs for which there is at least one classical TFRS in RegulonDB (For data details see Suppementary Material S2). For the statistical test see Methods. Two stars indicate statistically significant differences, as mentioned in the main text.

See this image and copyright information in PMC

Update of

A Gold Standard for Transcription Factor Regulatory Interactions in Escherichia coli K-12: Architecture of Evidence Types.
Lara P, Gama-Castro S, Salgado H, Rioualen C, Tierrafría VH, Muñiz-Rascado LJ, Bonavides-Martínez C, Collado-Vides J. Lara P, et al. bioRxiv [Preprint]. 2023 Dec 11:2023.02.25.530038. doi: 10.1101/2023.02.25.530038. bioRxiv. 2023. Update in: Front Genet. 2024 Mar 05;15:1353553. doi: 10.3389/fgene.2024.1353553. PMID: 37163020 Free PMC article. Updated. Preprint.

References

1. Anzai T., Imamura S., Ishihama A., Shimada T. (2020). Expanded roles of pyruvate-sensing PdhR in transcription regulation of the Escherichia coli K-12 genome: fatty acid catabolism and cell motility. Microb. Genom 6 (10), mgen000442. 10.1099/mgen.0.000442 - DOI - PMC - PubMed
1. Baseggio N., Davies W. D., Davidson B. E. (1990). Identification of the promoter, operator, and 5' and 3' ends of the mRNA of the Escherichia coli K-12 gene aroG. J. Bacteriol. 172 (5), 2547–2557. 10.1128/jb.172.5.2547-2557.1990 - DOI - PMC - PubMed
1. Baumgart L. A., Lee J. E., Salamov A., Dilworth D. J., Na H., Mingay M., et al. (2021). Persistence and plasticity in bacterial gene regulation. Nat. Methods 18 (12), 1499–1505. 10.1038/s41592-021-01312-2 - DOI - PubMed
1. Burns P. B., Rohrich R. J., Chung K. C. (2011). The levels of evidence and their role in evidence-based medicine. Plast. Reconstr. Surg. 128 (1), 305–310. 10.1097/PRS.0b013e318219c171 - DOI - PMC - PubMed
1. Choudhary K. S., Kleinmanns J. A., Decker K., Sastry A. V., Gao Y., Szubin R., et al. (2020). Elucidation of regulatory modes for five two-component systems in Escherichia coli reveals novel relationships. mSystems 5 (6), e00980-20. 10.1128/mSystems.00980-20 - DOI - PMC - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Flexible gold standards for transcription factor regulatory interactions in Escherichia coli K-12: architecture of evidence types

Affiliations

Flexible gold standards for transcription factor regulatory interactions in Escherichia coli K-12: architecture of evidence types

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Update of

References

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous