Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jan;191(1):91-9.
doi: 10.1128/JB.01202-08. Epub 2008 Oct 31.

A genomic distance based on MUM indicates discontinuity between most bacterial species and genera

Affiliations

A genomic distance based on MUM indicates discontinuity between most bacterial species and genera

Marc Deloger et al. J Bacteriol. 2009 Jan.

Abstract

The fundamental unit of biological diversity is the species. However, a remarkable extent of intraspecies diversity in bacteria was discovered by genome sequencing, and it reveals the need to develop clear criteria to group strains within a species. Two main types of analyses used to quantify intraspecies variation at the genome level are the average nucleotide identity (ANI), which detects the DNA conservation of the core genome, and the DNA content, which calculates the proportion of DNA shared by two genomes. Both estimates are based on BLAST alignments for the definition of DNA sequences common to the genome pair. Interestingly, however, results using these methods on intraspecies pairs are not well correlated. This prompted us to develop a genomic-distance index taking into account both criteria of diversity, which are based on DNA maximal unique matches (MUM) shared by two genomes. The values, called MUMi, for MUM index, correlate better with the ANI than with the DNA content. Moreover, the MUMi groups strains in a way that is congruent with routinely used multilocus sequence-typing trees, as well as with ANI-based trees. We used the MUMi to determine the relatedness of all available genome pairs at the species and genus levels. Our analysis reveals a certain consistency in the current notion of bacterial species, in that the bulk of intraspecies and intragenus values are clearly separable. It also confirms that some species are much more diverse than most. As the MUMi is fast to calculate, it offers the possibility of measuring genome distances on the whole database of available genomes.

PubMed Disclaimer

Figures

FIG. 1.
FIG. 1.
MUMi values as a function of k (the minimal size of a MUM) for six strain pairs involving E. coli MG1655 (MG) (accession number U00096). This genome was compared to those of E. coli W3110 (AP009048), Shigella dysenteriae (CP000034), E. coli CFT073 (AE014075), S. enterica (strain LT2, AE006468), Pectobacterium atrosepticum (formerly Erwinia carotovora, NC_004547), and B. aphidicola (AE013218). The arrow shows where the points of intraspecies and interspecies MUMi are the most widely separated.
FIG. 2.
FIG. 2.
Correlations of MUMi with other genomic distances. (Left) Correlation between conserved-gene and MUMi values. For the 48 intraspecies pairs used for comparison, the conserved-gene values were available (see the supplemental material) (17), the MUMi was calculated (see the list of pairs in Table S1 in the supplemental material), and both values are reported on the graph. (Right) Correlation between the ANI and MUMi values, with the same pairs as in the left panel.
FIG. 3.
FIG. 3.
SplitsTree representations of the MLST (A), MUMi (B), and ANI (C) distance matrices involving 10 E. coli strains, MG1655 (U00096), W3110 (W) (AP009048), HS (CP000802), 24377A (CP000800), Shigella sonnei (S. son) (CP000038), EDL933 (AE005174), CFT073 (CFT) (AE014075), APEC01 (CP000468), UTI89 (CP000243), and 536 (CP000247). Ambiguities are shown as rectangles.
FIG. 4.
FIG. 4.
SplitsTree representations of the MLST (A), MUMi (B), and ANI (C) distance matrices involving 11 S. aureus strains, Mu50 (Mu) (NC002758), N315 (NC002745), MW2 (NC003923), MRSA (NC002952), RF122 (NC007622), COL (NC002951), JH1 (NC009632), JH9 (NC009487), NCTC8325 (NC007795), USA300 (NC007793), and Newman (NC009641). Ambiguities are shown as rectangles.
FIG. 5.
FIG. 5.
Distribution of all maximal MUMi values per species.
FIG. 6.
FIG. 6.
Distribution of all minimal MUMi values per genus.

Similar articles

Cited by

References

    1. Auch, A. F., S. R. Henz, B. R. Holland, and M. Goker. 2006. Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences. BMC Bioinform. 7350. - PMC - PubMed
    1. Canchaya, C., M. J. Claesson, G. F. Fitzgerald, D. van Sinderen, and P. W. O'Toole. 2006. Diversity of the genus Lactobacillus revealed by comparative genomics of five species. Microbiology 1523185-3196. - PubMed
    1. Chain, P. S., E. Carniel, F. W. Larimer, J. Lamerdin, P. O. Stoutland, W. M. Regala, A. M. Georgescu, L. M. Vergez, M. L. Land, V. L. Motin, R. R. Brubaker, J. Fowler, J. Hinnebusch, M. Marceau, C. Medigue, M. Simonet, V. Chenal-Francisque, B. Souza, D. Dacheux, J. M. Elliott, A. Derbise, L. J. Hauser, and E. Garcia. 2004. Insights into the evolution of Yersinia pestis through whole-genome comparison with Yersinia pseudotuberculosis. Proc. Natl. Acad. Sci. USA 10113826-31. - PMC - PubMed
    1. Chen, X., M. Li, B. Ma, and J. Tromp. 2002. DNACompress: fast and effective DNA sequence compression. Bioinformatics 181696-1698. - PubMed
    1. Chiapello, H., I. Bourgait, F. Sourivong, G. Heuclin, A. Gendrault-Jacquemard, M. A. Petit, and M. El Karoui. 2005. Systematic determination of the mosaic structure of bacterial genomes: species backbone versus strain-specific loops. BMC Bioinform. 6171. - PMC - PubMed

Publication types