Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct 15;23(1):704.
doi: 10.1186/s12864-022-08927-5.

Comparative genome analysis of mycobacteria focusing on tRNA and non-coding RNA

Affiliations

Comparative genome analysis of mycobacteria focusing on tRNA and non-coding RNA

Phani Rama Krishna Behra et al. BMC Genomics. .

Abstract

Background: The Mycobacterium genus encompasses at least 192 named species, many of which cause severe diseases such as tuberculosis. Non-tuberculosis mycobacteria (NTM) can also infect humans and animals. Some are of emerging concern because they show high resistance to commonly used antibiotics while others are used and evaluated in bioremediation or included in anticancer vaccines.

Results: We provide the genome sequences for 114 mycobacterial type strains and together with 130 available mycobacterial genomes we generated a phylogenetic tree based on 387 core genes and supported by average nucleotide identity (ANI) data. The 244 genome sequences cover most of the species constituting the Mycobacterium genus. The genome sizes ranged from 3.2 to 8.1 Mb with an average of 5.7 Mb, and we identified 14 new plasmids. Moreover, mycobacterial genomes consisted of phage-like sequences ranging between 0 and 4.64% dependent on mycobacteria while the number of IS elements varied between 1 and 290. Our data also revealed that, depending on the mycobacteria, the number of tRNA and non-coding (nc) RNA genes differ and that their positions on the chromosome varied. We identified a conserved core set of 12 ncRNAs, 43 tRNAs and 18 aminoacyl-tRNA synthetases among mycobacteria.

Conclusions: Phages, IS elements, tRNA and ncRNAs appear to have contributed to the evolution of the Mycobacterium genus where several tRNA and ncRNA genes have been horizontally transferred. On the basis of our phylogenetic analysis, we identified several isolates of unnamed species as new mycobacterial species or strains of known mycobacteria. The predicted number of coding sequences correlates with genome size while the number of tRNA, rRNA and ncRNA genes does not. Together these findings expand our insight into the evolution of the Mycobacterium genus and as such they establish a platform to understand mycobacterial pathogenicity, their evolution, antibiotic resistance/tolerance as well as the function and evolution of ncRNA among mycobacteria.

Keywords: Core gene phylogeny; Mycobacterial genomes; tRNA and non-coding RNA.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Analysis of genome features. A The genome size distributions were analysed based on growth rate and pathogenicity assignments and plotted as box-plots. The colored boxes indicate the extent of the second and third quartiles, while the central black line represents the median genome size. Whiskers indicate minimum and maximum genome sizes. The number of coding sequences (B and C), tRNA genes (D and E), rRNA genes (F and G), and ncRNA genes (H and I) were plotted against genome sizes and R2 correlations were calculated and are shown in each plot (except for rRNA). Labels are as follows: NP = non-pathogenic; OP = opportunistic pathogens; P = pathogens; U = unknown; R = RGM; S = SGM
Fig. 2
Fig. 2
Core gene phylogeny of Mycobacteria. A phylogeny based on “387 core genes” present in all mycobacteria was calculated as described in Methods. The tree is divided into slow (SGM; orange) and rapid growing mycobacteria (RGM; green). Black indicates that no information was available to determine growth rate (“unknown”). Bootstrap support values from 1000 cycles are indicated as colored dots at the respective nodes (100%) or by their actual values (below 100%). Mycobacterial clades are indicated by boxes and vertical text to the right of the boxes refers to the clade names while species positioned outside the boxes represent single species clades. Pairwise ANI values were calculated for all the genomes; branches of the tree are colored according to these values (see legend to the left and Table S4). We emphasize to color the branches on both sides of a connecting node, all species on one side of the node must have ANI values within the range compared with all other species on the other side of the node and vice versa. Individual genomes may have ANI-values that are higher than the range of values indicated by the coloring compared with one or more genomes on the other side of the node. Underlined species were those sequenced in this study, while species marked with black dots were previously reported [, –27]. *Marks the positioning of the M. farcinogenes DSM 43637 strain sequenced in this study while the other M. farcinogenes DSM 43637 strain corresponds to the available genome sequence at the NCBI database, see main text for details. ##Marks the isolate/genome sequence M. microti OV254, which based on our combined data should be considered as a M. simiae strain (see Discussion).
Fig. 3
Fig. 3
Distribution of IS elements and Phages in mycobacteria. Heat maps for 244 mycobacteria showing the presence of insertion sequence (IS) elements and bacteriophage derived sequences predicted using ISsaga [32] and Phaster [33], respectively. The “387 core gene” phylogenetic tree (see Fig. 2) and clade names are shown to the left (the branches and clades column are marked in pink and blue, alternating, while black mark single clades to facilitate guidance). The second and third columns indicate pathogenicity and growth rate, respectively, according to the colour key. The types of IS elements, and classification of predicted phage derived sequences are as indicated on the top. The different colors represent the numbers of IS elements and percentage of phage DNA per genome (see color key to the left in figure). Plots of the total number of predicted IS elements and percentage of phage DNA per genome are shown to the right of the respective heat maps
Fig. 4
Fig. 4
Distribution of tRNAs in mycobacteria. Heat map showing the presence of tRNAs for 244 mycobacteria. The “387 core gene” phylogenetic tree (see Fig. 2) and clade names are shown to the left and the colouring scheme is the same as in Fig. 3. Core tRNAs are present in most of the mycobacteria, while auxiliary tRNAs are present in a minority. The presence or absence of tRNA isoacceptors are marked in green and gray, respectively. The total number of predicted tRNAs is shown as indicated. To the right we show the presence (green) and absence (gray) of the HNH endonuclease and the GOLLD ncRNA genes; for details see main text and Supplementary information
Fig. 5
Fig. 5
Distribution of aminoacyl-tRNA synthetases in mycobacteria. Heat map showing the presence of aminoacyl-tRNA synthetases (AARS), gatABC and tilS in 244 mycobacteria. The “387 core gene” phylogenetic tree (see Fig. 2) and clade names are shown to the left and the coloring scheme is the same as in Fig. 3. The total number of predicted AARS is shown to the right of the heat map. * indicates that LysRS includes both the regular LysRS and the lysyl-phosphatidyl-glycerol biosynthesis bifunctional protein LysX (see Supplementary information)
Fig. 6
Fig. 6
Distribution of non-coding RNAs in mycobacteria. Heat map showing the presence of non-coding RNAs in 244 mycobacteria predicted using the RFAM v13.0 database [–43] and INFERNAL v1.1.2 [44]. The “387 core gene” phylogenetic tree (see Fig. 2) and clade names are shown to the left and the coloring scheme is the same as in Fig. 3. The presence and number of non-coding RNAs is indicated according to the color legend. The ncRNAs marked in red correspond to ncRNAs also predicted using the “M. tuberculosis H37Rv ncRNA data set” [40] (see Supplementary Fig. S6). A plot of the total number of predicted non-coding RNAs is shown to the right

References

    1. Primm TP, Lucero CA, Falkinham JO. Health impacts of environmental mycobacteria. Clin Microbiol Rev. 2004;17:98–106. doi: 10.1128/CMR.17.1.98-106.2004. - DOI - PMC - PubMed
    1. Vaerewijck MJM, Huys G, Palomino JC, Swings J, Portaels F. Mycobacteria in drinking water distribution systems: ecology and significance for human health. FEMS Microbiol Rev. 2005;29:911–934. doi: 10.1016/j.femsre.2005.02.001. - DOI - PubMed
    1. Goodfellow M, Kämpfer P, Busse H-J, Trujillo ME, Suzuki K, Ludwig W, Whitman WB. Bergey's manual of systematic bacteriology. 2. New York: Springer, New York; 2012.
    1. Hatfull GF, Jacobs WR. Molecular genetics of mycobacteria, second edition. Washington, DC: ASM press; 2014.
    1. Tortoli E. Microbiological features and clinical relevance of new species of the genus Mycobacterium. Clin Microbiol Rev. 2014;27:727–752. doi: 10.1128/CMR.00035-14. - DOI - PMC - PubMed