Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul;124(1):91-101.
doi: 10.1111/mmi.15370. Epub 2025 Jun 17.

The Mycobacterium tuberculosis Transposon Sequencing Database (MtbTnDB): A Large-Scale Guide to Genetic Conditional Essentiality

Affiliations

The Mycobacterium tuberculosis Transposon Sequencing Database (MtbTnDB): A Large-Scale Guide to Genetic Conditional Essentiality

Adrian Jinich et al. Mol Microbiol. 2025 Jul.

Abstract

Characterizing genetic essentiality across various conditions is fundamental for understanding gene function. Transposon sequencing (TnSeq) is a powerful technique to generate genome-wide essentiality profiles in bacteria and has been extensively applied to Mycobacterium tuberculosis (Mtb). Dozens of TnSeq screens have yielded valuable insights into the biology of Mtb in vitro, inside macrophages, and in model host organisms. Despite their value, these Mtb TnSeq profiles have not been standardized or collated into a single, easily searchable database. This results in significant challenges when attempting to query and compare these resources, limiting our ability to obtain a comprehensive and consistent understanding of genetic conditional essentiality in Mtb. We address this problem by building a central repository of publicly available Mtb TnSeq screens, the Mtb transposon sequencing database (MtbTnDB). The MtbTnDB is a living resource that encompasses to date ≈150 standardized TnSeq screens, enabling open access to data, visualizations, and functional predictions through an interactive web app (www.mtbtndb.app). We conduct several statistical analyses on the complete database, such as demonstrating that (i) genes in the same genomic neighborhood have similar TnSeq profiles, and (ii) clusters of genes with similar TnSeq profiles are enriched for genes from similar functional categories. We further analyze the performance of machine learning models trained on TnSeq profiles to predict the functional annotation of orphan genes in Mtb. By facilitating the comparison of TnSeq screens across conditions, the MtbTnDB will accelerate the exploration of conditional genetic essentiality, provide insights into the functional organization of Mtb genes, and help predict gene function in this important human pathogen.

Keywords: database; functional genetics; microbiology; transposon sequencing; tuberculosis.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

FIGURE 1
FIGURE 1
Schematic diagram of compilation and standardization of M. tuberculosis TnSeq screens. (A) Raw sequencing reads were collected from publicly available TnSeq screens in the literature and the FLUTE database. (B) Using a standardized statistical processing framework with TRANSIT (DeJesus et al. 2015), each set of raw reads was converted into an essentiality table, with genes in the Mtb genome assigned log2 fold‐change values (relative to specified control conditions) and p‐values corrected for multiple hypothesis testing. (C) The set of TnSeq data can be queried in the MtbTnDB online portal, either by screen or by gene of interest. (D) The database is amenable to both supervised and unsupervised machine learning approaches to generate hypotheses about gene function.
FIGURE 2
FIGURE 2
Distribution of conditional essentiality calls in the MtbTnDB across all genes. (A) Cumulative distribution of the number of conditionally essential calls across all genes in the Mtb genomes (B) Number of conditional essentiality calls versus position in the genome across the MtbTnDB for all H37Rv Mtb genes. The darker color of some areas indicates genes points overlapping each other.
FIGURE 3
FIGURE 3
An online explorer of TnSeq datasets. A screenshot of the “Analyze datasets” modality is shown. Users can select which TnSeq screen to explore, whether to display either the standardized or the original publication's dataset, as well as significance thresholds for log2 fold‐changes and q‐values. A brief summary of the TnSeq screen is shown in the left panel, alongside a link to the original publication and the number of replicates for both control and experimental conditions. An interactive volcano plot highlights conditionally essential genes, which can also be accessed and selected through the table on the right.
FIGURE 4
FIGURE 4
Mapping out conditionally essential genes of unknown function. Data shown correspond to a single TnSeq screen in the MtbTnDB: An H37Rv in vivo (mouse) screen (DeJesus, Nambi, et al. 2017). Genes are represented as circles, and are binned along the x‐axis into five distinct categories according to their annotation level and along the y‐axis according to their conditional essentiality (the exact position within each bin/category is not meaningful). The least well characterized genes (i.e., orphans) that are conditionally essential in this particular TnSeq screen are shown in orange (the gene identifiers for five example genes are shown). These five highlighted genes represent random samples of the orphan genes found to be conditionally essential within this TnSeq screen condition.
FIGURE 5
FIGURE 5
Genes with similar TnSeq profiles tend to have similar functions. Neighboring genes in the genome have more similar TnSeq profiles, and clusters of genes with similar TnSeq profiles are enriched for annotated genes belonging to the same functional categories. (A) Distribution of Euclidean distances (in UMAP projection) for pairs of random genes (orange) and neighboring genes (i.e., less than 3 genes away from each other in the genome). The two distributions are significantly different (p = 10−17, Kolmogorov–Smirnov test). (B) A UMAP of TnSeq profiles. Genes with at least one conditional essentiality call in the MtbTnDB were clustered using k‐means according to their UMAP coordinates and color‐coded. Three clusters enriched for gene functional categories are shown.
FIGURE 6
FIGURE 6
Mapping TnSeq essentiality profiles to gene functional categories with machine learning classifier. The receiver operating characteristic (ROC) curve for each functional class is shown. Multilabel predictions were generated using XGBoost (Chen and Guestrin 2016). The number of genes in each functional category are: PE/PPE (162); cell wall and cell processes (771); information pathways (242); insertion sequences and phages (142); intermediary metabolism and respiration (933); lipid metabolism (270); regulatory proteins (197); and virulence, detoxification, adaptation (220).

References

    1. Barquist, L. , Mayho M., Cummins C., et al. 2016. “The TraDIS Toolkit: Sequencing and Analysis for Dense Transposon Mutant Libraries.” Bioinformatics 32: 1109–1111. - PMC - PubMed
    1. Bellerose, M. M. , Baek S.‐H., Huang C.‐C., et al. 2019. “Common Variants in the Glycerol Kinase Gene Reduce Tuberculosis Drug Efficacy.” MBio 10: e00663‐19. 10.1128/mBio.00663-19. - DOI - PMC - PubMed
    1. Cain, A. K. , Barquist L., Goodman A. L., Paulsen I. T., Parkhill J., and van Opijnen T.. 2020. “A Decade of Advances in Transposon‐Insertion Sequencing.” Nature Reviews. Genetics 21: 526–540. - PMC - PubMed
    1. Carey, A. F. , Rock J. M., Krieger I. V., et al. 2018. “TnSeq of <styled-content style="fixed-case"> Mycobacterium tuberculosis </styled-content> Clinical Isolates Reveals Strain‐Specific Antibiotic Liabilities.” PLoS Pathogens 14: e1006939. - PMC - PubMed
    1. Chao, M. C. , Abel S., Davis B. M., and Waldor M. K.. 2016. “The Design and Analysis of Transposon Insertion Sequencing Experiments.” Nature Reviews. Microbiology 14: 119–128. - PMC - PubMed

MeSH terms

Substances

LinkOut - more resources