Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun;8(6):mgen000839.
doi: 10.1099/mgen.0.000839.

A critical evaluation of Mycobacterium bovis pangenomics, with reference to its utility in outbreak investigation

Affiliations

A critical evaluation of Mycobacterium bovis pangenomics, with reference to its utility in outbreak investigation

Kristina M Ceres et al. Microb Genom. 2022 Jun.

Abstract

The increased accessibility of next generation sequencing has allowed enough genomes from a given bacterial species to be sequenced to describe the distribution of genes in the pangenome, without limiting analyses to genes present in reference strains. Although some taxa have thousands of whole genome sequences available on public databases, most genomes were sequenced with short read technology, resulting in incomplete assemblies. Studying pangenomes could lead to important insights into adaptation, pathogenicity, or molecular epidemiology, however given the known information loss inherent in analyzing contig-level assemblies, these inferences may be biased or inaccurate. In this study we describe the pangenome of a clonally evolving pathogen, Mycobacterium bovis , and examine the utility of gene content variation in M. bovis outbreak investigation. We constructed the M. bovis pangenome using 1463 de novo assembled genomes. We tested the assumption of strict clonal evolution by studying evidence of recombination in core genes and analyzing the distribution of accessory genes among core monophyletic groups. To determine if gene content variation could be utilized in outbreak investigation, we carefully examined accessory genes detected in a well described M. bovis outbreak in Minnesota. We found significant errors in accessory gene classification. After accounting for these errors, we show that M. bovis has a much smaller accessory genome than previously described and provide evidence supporting ongoing clonal evolution and a closed pangenome, with little gene content variation generated over outbreaks. We also identified frameshift mutations in multiple genes, including a mutation in glpK, which has recently been associated with antibiotic tolerance in Mycobacterium tuberculosis . A pangenomic approach enables a more comprehensive analysis of genome dynamics than is possible with reference-based approaches; however, without critical evaluation of accessory gene content, inferences of transmission patterns employing these loci could be misguided.

Keywords: Mycobacterium tuberculosis complex; molecular epidemiology; pangenomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Figures

Fig. 1.
Fig. 1.
M. bovis core phylogeny and sample geography. A) A recombination-free core phylogenetic tree constructed in Gubbins with a GTR substitution model is shown with clonal complex labels. Sample collection continent distribution is shown in B) and host distribution is shown in C). The number of samples included from each country is shown in D).
Fig. 2.
Fig. 2.
Population structure assessed in SNP variation and gene presence absence patterns. A) Gene content similarity dendrogram created using the Jaccard distance of the pangenome gene presence absence matrix. Core clonal complexes are labelled along the circumference of the tree. B-C) Core SNP and gene content variation principal components analysis coloured by core clonal complexes created with core SNP principal components. SNP variation is shown in B) and gene content variation is shown in C). Gene content variation is not concordant with SNP variation.
Fig. 3.
Fig. 3.
Accessory gene labels are created through redundant gene annotation. A) The distribution of all redundantly labelled genes, coloured by the fraction of the time all genes in a redundant gene annotation group were found in the same genome. Within each reference genome, the number of mapped genes that were found within 1000 bp of the end of a contig are shown (black circles). B) and C) Alignments of two example genes, pks7 and BQ2027_MB2105 (purple) with redundantly annotated accessory genes (blue). D) After removing redundant gene annotations, the accessory genome size (genes present in less than 95 % of genomes) decreased from 1464 to 137 genes.
Fig. 4.
Fig. 4.
Filtered gene presence absence patterns in the M. bovis pangenome. The core genome phylogeny from Fig. 1 is shown, labelled with clonal complexes. The phage PhiRv1 genes are also labelled. Branch lengths are in SNP units.
Fig. 5.
Fig. 5.
Gene content variation in a Minnesota outbreak. Figure A shows a core genome phylogeny created with IQ-TREE2 using a GTR substitution model, and a gene presence absence matrix. Variation in gene content is observed in three genes, not including PE/PPE genes, however all gene content variation was driven by indels. An alignment of a variable homopolymeric tract region of M. bovis AF2122/97 fdrBC is shown next to the gene presence absence matrix. This homopolymeric tract variation resulted in differential gene presence absence annotation of frdD and frdB genes among outbreak samples. Indel variation is shown in a tree created with binary indel presence/absence data in IQ-TREE2 with a JC2 substitution model in B. Three samples are highlighted to show disparate topologies of the SNP and indel trees. Three samples that had conflicting phylogenetic positions are highlighted in green.
Fig. 6.
Fig. 6.
Methodological differences result in drastically different pangenome sizes. Panaroo produced a smaller accessory genome than that described in a recent M. bovis pangenome study. Soft core genes are present in greater than 95 % of samples. Shell genes are present in less than 95 % of samples but more than two samples. Cloud genes are present in one or two samples. After filtering redundantly labelled genes, the pangenome size was further reduced. The M. bovis pangenome is not open and shows presence and absence patterns consistent with clonal evolution.

References

    1. Ceres KM, Stanhope MJ, Gröhn YT. 2022. A critical evaluation of Mycobacterium bovis pangenomics, with reference to its utility in outbreak investigation. Figshare. - DOI - PMC - PubMed
    1. Boritsch EC, Khanna V, Pawlik A, Honoré N, Navas VH, et al. Key experimental evidence of chromosomal DNA transfer among selected tuberculosis-causing mycobacteria. Proc Natl Acad Sci U S A. 2016;113:9876–9881. doi: 10.1073/pnas.1604921113. - DOI - PMC - PubMed
    1. Chiner-Oms Á, Sánchez-Busó L, Corander J, Gagneux S, Harris SR, et al. Genomic determinants of speciation and spread of the Mycobacterium tuberculosis complex. Sci Adv. 2019;5:eaaw3307. doi: 10.1126/sciadv.aaw3307. - DOI - PMC - PubMed
    1. Patané JSL, Martins J, Beatriz Castelão A, Nishibe C, Montera L, et al. Patterns and processes of Mycobacterium bovis evolution revealed by phylogenomic analyses. Genome Biol Evol. 2017 doi: 10.1093/gbe/evx022. - DOI - PMC - PubMed
    1. Kavvas ES, Catoiu E, Mih N, Yurkovich JT, Seif Y, et al. Machine learning and structural analysis of Mycobacterium tuberculosis pan-genome identifies genetic signatures of antibiotic resistance. Nat Commun. 2018;9:4306. doi: 10.1038/s41467-018-06634-y. - DOI - PMC - PubMed

Publication types

LinkOut - more resources