Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb 21;6(1):lqae018.
doi: 10.1093/nargab/lqae018. eCollection 2024 Mar.

CDBProm: the Comprehensive Directory of Bacterial Promoters

Affiliations

CDBProm: the Comprehensive Directory of Bacterial Promoters

Gustavo Sganzerla Martinez et al. NAR Genom Bioinform. .

Abstract

The decreasing cost of whole genome sequencing has produced high volumes of genomic information that require annotation. The experimental identification of promoter sequences, pivotal for regulating gene expression, is a laborious and cost-prohibitive task. To expedite this, we introduce the Comprehensive Directory of Bacterial Promoters (CDBProm), a directory of in-silico predicted bacterial promoter sequences. We first identified that an Extreme Gradient Boosting (XGBoost) algorithm would distinguish promoters from random downstream regions with an accuracy of 87%. To capture distinctive promoter signals, we generated a second XGBoost classifier trained on the instances misclassified in our first classifier. The predictor of CDBProm is then fed with over 55 million upstream regions from more than 6000 bacterial genomes. Upon finding potential promoter sequences in upstream regions, each promoter is mapped to the genomic data of the organism, linking the predicted promoter with its coding DNA sequence, and identifying the function of the gene regulated by the promoter. The collection of bacterial promoters available in CDBProm enables the quantitative analysis of a plethora of bacterial promoters. Our collection with over 24 million promoters is publicly available at https://aw.iimas.unam.mx/cdbprom/.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
XGBoost classification method.
Figure 2.
Figure 2.
Mapping the decision pattern of the XGBoost model.
Figure 3.
Figure 3.
Second instance of the XGBoost classification trained and tested with the misclassified promoters from the first instance.
Figure 4.
Figure 4.
Validation of the CDBProm's XGBoost ensemble classification with external Escherichia coli promoter sequences.

References

    1. Cases I., de Lorenzo V., Ouzounis C.A.. Transcription regulation and environmental adaptation in bacteria. Trends Microbiol. 2003; 11:248–253. - PubMed
    1. Barnard A., Wolfe A., Busby S.. Regulation at complex bacterial promoters: how bacteria use different promoter organizations to produce different regulatory outcomes. Curr. Opin. Microbiol. 2004; 7:102–108. - PubMed
    1. Krebs J.E., Goldstein E.S., Kilpatrick S.T.. Lewin's Gene XII. 2017; 12th ednBurlington: Jones & Bartlett Learning.
    1. Thomas M.S., Wigneshweraraj S.. Regulation of virulence gene expression. Virulence. 2014; 5:832–834. - PMC - PubMed
    1. Connolly J.P.R., O’Boyle N., Turner N.C.A., Browning D.F., Roe A.J. Distinct intraspecies virulence mechanisms regulated by a conserved transcription factor. Proc. Nat. Acad. Sci. U.S.A. 2019; 116:19695–19704. - PMC - PubMed