Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 15;12(1):624.
doi: 10.1038/s41597-025-04966-1.

Comprehensive Whole Genome Sequencing Dataset of Mycobacterium tuberculosis Strains Collected Across Italy

Affiliations

Comprehensive Whole Genome Sequencing Dataset of Mycobacterium tuberculosis Strains Collected Across Italy

Arash Ghodousi et al. Sci Data. .

Abstract

Tuberculosis (TB), caused by the Mycobacterium tuberculosis complex (MTBC), remains a major global health challenge. Whole genome sequencing (WGS) offers an invaluable tool for understanding the genetic diversity and drug resistance profiles of MTBC. This study provides a comprehensive WGS dataset of 2,520 MTBC isolates collected from four Italian regions-Lombardy, Piedmont, Emilia-Romagna, and Lazio-between 2017 and 2020. The dataset includes genomic data along with associated metadata, such as geographic location and drug susceptibility profiles, providing a robust resource for studying TB epidemiology and transmission dynamics. This collection represents the largest publicly available MTBC WGS dataset from Italy and has been validated to ensure accuracy and completeness. By making this dataset accessible, we aim to support collaborative research, facilitate the exploration of MTBC evolution and drug resistance, and enhance TB surveillance efforts.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Workflow Overview for MTBC Genomic and Drug Resistance Profiling. This figure illustrates the workflow used for the genomic and drug resistance profiling of Mycobacterium tuberculosis complex (MTBC) isolates across four Italian regions: Emilia-Romagna, Lazio, Lombardy, and Piedmont. The process begins with data collection (left panel), where MTBC isolates from tuberculosis patients are gathered, and phenotypic drug susceptibility testing (pDST) and HIV status are recorded. Whole-genome sequencing (WGS) is performed on the isolates. In the data analysis stage (middle panel), sequencing data undergo quality control (QC) of FASTQ files, followed by mapping to a reference genome. Variant calling is conducted, generating BAM and VCF files for further analysis. The data generation phase (right panel) includes drug resistance profiling, genotyping, and phylogeny or cluster analysis to study the genetic relationships between isolates. Finally, technical validation (far right panel) is performed to confirm the accuracy of variant calling and resistance profiling through drug susceptibility tests and validation of VCF files. This workflow outlines the integration of WGS data with clinical and technical validation to study MTBC in the context of drug resistance and epidemiology. Figure created using BioRender.com by Arash Ghodousi with license to publish.
Fig. 2
Fig. 2
(a) Histogram plot showing the percentage distribution of Mycobacterium tuberculosis complex (MTBC) lineages and sublineages across four Italian regions: Emilia-Romagna, Lazio, Lombardy, and Piedmont. The X-axis represents different MTBC lineages, while the Y-axis shows the percentage of isolates for each lineages and sublineages. This figure categorizes the genetic diversity of MTBC isolates into lineages and sublineages as follows: Lineage 1 includes EAI and EAI Manila; Lineage 2 is represented by Beijing; Lineage 3 by Delhi-CAS; Lineage 4 encompasses Haarlem, LAM, Cameroon, X-type, Ural, S-type, mainly-T, Euro-American and H37Rv-like; Lineage 5 and 6 are represented by West-Africa 1 and West-Africa 2, respectively. (b) This figure represents the percentage distribution of MTBC lineages across different drug resistance profiles: Pan-susceptible (Pan-S), Rifampicin-resistant/Multidrug-resistant (RR/MDR), Pre-extensively drug-resistant/Extensively drug-resistant (Pre-XDR/XDR), and Other*. The X-axis shows the drug resistance profiles, while the Y-axis indicates the percentage of isolates for each lineage. Different colors represent the MTBC lineages, with Lineage 4 (Euro-American) being dominant across most drug resistance profiles. The figure highlights the correlation between specific MTBC lineages and their drug resistance profiles. *Note: The “Other” category includes Mono/Poly resistance to other anti-TB drugs not classified within the RR/MDR, or Pre-XDR/XDR categories.
Fig. 3
Fig. 3
Maximum Likelihood Phylogenetic Tree of Mycobacterium tuberculosis complex (MTBC) isolates. This figure displays a maximum likelihood phylogenetic tree, constructed from whole genome sequencing data of 2,520 MTBC isolates, gathered from four Italian regions—Piedmont, Lombardy, Emilia-Romagna, and Lazio—between January 2017 and June 2020. Each tip on the tree denotes a unique isolate. Branches are color-coded to illustrate various attributes: sublineages are represented from the innermost part of the tree, followed by broader lineages, types of infection (Pulmonary vs extra-pulmonary), HIV status, clustering status (Group) which is determined by a 5-SNP threshold, and drug resistance profiles. RR: Rifampicin resistant. MDR: Multidrug-resistant. Pre-XDR: Pre-extensively drug-resistant. XDR: Extensively drug-resistant.

References

    1. Global tuberculosis report 2024. Geneva: World Health organization; licence: CC BY-NC-SA 3.0 IGO (2024).
    1. Global tuberculosis report 2023. Geneva: World Health Organization; Licence: CC BY-NC-SA 3.0 IGO (2023).
    1. Merker, M. et al. Evolutionary history and global spread of the Mycobacterium tuberculosis Beijing lineage. Nat Genet47, 242–249, 10.1038/ng.3195 (2015). - PMC - PubMed
    1. Meehan, C. J. et al. Whole genome sequencing of Mycobacterium tuberculosis: current standards and open issues. Nat Rev Microbiol17, 533–545 (2019). - PubMed
    1. Yenew, B. et al. A smooth tubercle bacillus from Ethiopia phylogenetically close to the Mycobacterium tuberculosis complex. Nat Commun14, 7519, 10.1038/s41467-023-42755-9 (2023). - PMC - PubMed

LinkOut - more resources