Benchmarking of methods for genomic taxonomy

doi:10.1128/JCM.02981-13

. 2014 May;52(5):1529-39.

doi: 10.1128/JCM.02981-13. Epub 2014 Feb 26.

Benchmarking of methods for genomic taxonomy

Mette V Larsen¹, Salvatore Cosentino, Oksana Lukjancenko, Dhany Saputra, Simon Rasmussen, Henrik Hasman, Thomas Sicheritz-Pontén, Frank M Aarestrup, David W Ussery, Ole Lund

Affiliations

PMID: 24574292
PMCID: PMC3993634
DOI: 10.1128/JCM.02981-13

Benchmarking of methods for genomic taxonomy

Mette V Larsen et al. J Clin Microbiol. 2014 May.

. 2014 May;52(5):1529-39.

doi: 10.1128/JCM.02981-13. Epub 2014 Feb 26.

Authors

Mette V Larsen¹, Salvatore Cosentino, Oksana Lukjancenko, Dhany Saputra, Simon Rasmussen, Henrik Hasman, Thomas Sicheritz-Pontén, Frank M Aarestrup, David W Ussery, Ole Lund

Affiliation

¹ Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Kongens Lyngby, Denmark.

PMID: 24574292
PMCID: PMC3993634
DOI: 10.1128/JCM.02981-13

Abstract

One of the first issues that emerges when a prokaryotic organism of interest is encountered is the question of what it is--that is, which species it is. The 16S rRNA gene formed the basis of the first method for sequence-based taxonomy and has had a tremendous impact on the field of microbiology. Nevertheless, the method has been found to have a number of shortcomings. In the current study, we trained and benchmarked five methods for whole-genome sequence-based prokaryotic species identification on a common data set of complete genomes: (i) SpeciesFinder, which is based on the complete 16S rRNA gene; (ii) Reads2Type that searches for species-specific 50-mers in either the 16S rRNA gene or the gyrB gene (for the Enterobacteraceae family); (iii) the ribosomal multilocus sequence typing (rMLST) method that samples up to 53 ribosomal genes; (iv) TaxonomyFinder, which is based on species-specific functional protein domain profiles; and finally (v) KmerFinder, which examines the number of cooccurring k-mers (substrings of k nucleotides in DNA sequence data). The performances of the methods were subsequently evaluated on three data sets of short sequence reads or draft genomes from public databases. In total, the evaluation sets constituted sequence data from more than 11,000 isolates covering 159 genera and 243 species. Our results indicate that methods that sample only chromosomal, core genes have difficulties in distinguishing closely related species which only recently diverged. The KmerFinder method had the overall highest accuracy and correctly identified from 93% to 97% of the isolates in the evaluations sets.

PubMed Disclaimer

Figures

**FIG 1**
Performance of the five methods for species identification on the indicated data sets. The rMLST and TaxonomyFinder methods take only draft or complete genomes as input, while Reads2Type works only for short reads. Correct (genus and species), predicted genus and species are in accordance with the annotation; only genus correct, the predicted genus is in accordance with the annotation, but the species is not; not even genus correct, neither predicted genus nor species is in accordance with the annotation.

**FIG 2**
Overlap in predictions by the five methods for species identification. Numbers written in regular font indicate the number of isolates for which the predicted species corresponds to the annotated species. Numbers written in italics indicate the number of isolates for which the predicted and annotated species differ. The methods used and data sets evaluated are indicated.

**FIG 3**
Predictions for the most common species of the NCBI_drafts set. For each method, indicated at the top of each panel, the results for a given species are only shown if the method made a prediction for five or more isolates annotated as this species (e.g., if there are five isolates annotated as species A in the data set, but the method was not able to make a prediction for one of the isolates, the species is not shown) or if two or more isolates are predicted as this species (e.g., if there are no isolates annotated as species B in the data set but two isolates annotated as species C are predicted to be species B, then species B is shown).

**FIG 4**
Predictions for the most common species in the SRA_drafts data set. For each method, indicated at the top of each panel, the results for a given species is shown only if the method made a prediction for 10 or more isolates annotated as this species or if two or more isolates are predicted as this species.

See this image and copyright information in PMC

Cited by

Emergence and clonal expansion of Aeromonas hydrophila ST1172 that simultaneously produces MOX-13 and OXA-724.
Chen X, Lu M, Wang Y, Zhang H, Jia X, Jia P, Yang W, Chen J, Song G, Zhang J, Xu Y. Chen X, et al. Antimicrob Resist Infect Control. 2024 Mar 3;13(1):28. doi: 10.1186/s13756-023-01339-4. Antimicrob Resist Infect Control. 2024. PMID: 38433212 Free PMC article.
Draft Genome Sequence of Enterobacter cloacae 3D9 (Phylum Proteobacteria).
Dumigan CR, Perry GE, Pauls KP, Raizada MN. Dumigan CR, et al. Microbiol Resour Announc. 2018 Oct 25;7(16):e00902-18. doi: 10.1128/MRA.00902-18. eCollection 2018 Oct. Microbiol Resour Announc. 2018. PMID: 30533747 Free PMC article.
Draft Genome Sequence of Enterobacter cloacae 3F11 (Phylum Proteobacteria).
Dumigan CR, Perry GE, Pauls KP, Raizada MN. Dumigan CR, et al. Microbiol Resour Announc. 2018 Aug 16;7(6):e00846-18. doi: 10.1128/MRA.00846-18. eCollection 2018 Aug. Microbiol Resour Announc. 2018. PMID: 30533901 Free PMC article.
The Distribution of Campylobacter jejuni Virulence Genes in Genomes Worldwide Derived from the NCBI Pathogen Detection Database.
Panzenhagen P, Portes AB, Dos Santos AMP, Duque SDS, Conte Junior CA. Panzenhagen P, et al. Genes (Basel). 2021 Sep 28;12(10):1538. doi: 10.3390/genes12101538. Genes (Basel). 2021. PMID: 34680933 Free PMC article.
Genome Sequences of Clinical Isolates of NDM-1-Producing Klebsiella quasipneumoniae subsp. similipneumoniae and KPC-2-Producing Klebsiella quasipneumoniae subsp. quasipneumoniae from Brazil.
Fuga B, Cerdeira L, Andrade F, Zaccariotto T, Esposito F, Cardoso B, Rodrigues L, Neves I, Levy CE, Lincopan N. Fuga B, et al. Microbiol Resour Announc. 2020 Mar 5;9(10):e00089-20. doi: 10.1128/MRA.00089-20. Microbiol Resour Announc. 2020. PMID: 32139569 Free PMC article.

See all "Cited by" articles

References

1. Fox GE, Peckman KJ, Woese CE. 1977. Comparative cataloging of 16S ribosomal ribonucleic acid: molecular approach to procaryotic systematics. Int. J. Syst. Evol. Bacteriol. 27:44–57. 10.1099/00207713-27-1-44 - DOI
1. DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL. 2006. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 72:5069–5072. 10.1128/AEM.03006-05 - DOI - PMC - PubMed
1. Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J, Glockner FO. 2007. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 35:7188–7196. 10.1093/nar/gkm864 - DOI - PMC - PubMed
1. Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar Buchner A, Lai T, Steppi S, Jobb G, Forster W, Brettske I, Gerber S, Ginhart AW, Gross O, Grumann S, Hermann S, Jost R, Konig A, Liss T, Lussmann R, May M, Nonhoff B, Reichel B, Strehlow R, Stamatakis A, Stuckmann N, Vilbig A, Lenke M, Ludwig T, Bode A, Schleifer KH. 2004. ARB: a software environment for sequence data. Nucleic Acids Res. 32:1363–1371. 10.1093/nar/gkh293 - DOI - PMC - PubMed
1. Tindall BJ, Rossello-Mora R, Busse HJ, Ludwig W, Kampfer P. 2010. Notes on the characterization of prokaryote strains for taxonomic purposes. Int. J. Syst. Evol. Microbiol. 60:249–266. 10.1099/ijs.0.016949-0 - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

[1] Fox GE, Peckman KJ, Woese CE. 1977. Comparative cataloging of 16S ribosomal ribonucleic acid: molecular approach to procaryotic systematics. Int. J. Syst. Evol. Bacteriol. 27:44–57. 10.1099/00207713-27-1-44 - DOI

[2] Fox GE, Peckman KJ, Woese CE. 1977. Comparative cataloging of 16S ribosomal ribonucleic acid: molecular approach to procaryotic systematics. Int. J. Syst. Evol. Bacteriol. 27:44–57. 10.1099/00207713-27-1-44 - DOI

[3] DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL. 2006. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 72:5069–5072. 10.1128/AEM.03006-05 - DOI - PMC - PubMed

[4] DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL. 2006. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 72:5069–5072. 10.1128/AEM.03006-05 - DOI - PMC - PubMed

[5] Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J, Glockner FO. 2007. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 35:7188–7196. 10.1093/nar/gkm864 - DOI - PMC - PubMed

[6] Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J, Glockner FO. 2007. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 35:7188–7196. 10.1093/nar/gkm864 - DOI - PMC - PubMed

[7] Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar Buchner A, Lai T, Steppi S, Jobb G, Forster W, Brettske I, Gerber S, Ginhart AW, Gross O, Grumann S, Hermann S, Jost R, Konig A, Liss T, Lussmann R, May M, Nonhoff B, Reichel B, Strehlow R, Stamatakis A, Stuckmann N, Vilbig A, Lenke M, Ludwig T, Bode A, Schleifer KH. 2004. ARB: a software environment for sequence data. Nucleic Acids Res. 32:1363–1371. 10.1093/nar/gkh293 - DOI - PMC - PubMed

[8] Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar Buchner A, Lai T, Steppi S, Jobb G, Forster W, Brettske I, Gerber S, Ginhart AW, Gross O, Grumann S, Hermann S, Jost R, Konig A, Liss T, Lussmann R, May M, Nonhoff B, Reichel B, Strehlow R, Stamatakis A, Stuckmann N, Vilbig A, Lenke M, Ludwig T, Bode A, Schleifer KH. 2004. ARB: a software environment for sequence data. Nucleic Acids Res. 32:1363–1371. 10.1093/nar/gkh293 - DOI - PMC - PubMed

[9] Tindall BJ, Rossello-Mora R, Busse HJ, Ludwig W, Kampfer P. 2010. Notes on the characterization of prokaryote strains for taxonomic purposes. Int. J. Syst. Evol. Microbiol. 60:249–266. 10.1099/ijs.0.016949-0 - DOI - PubMed

[10] Tindall BJ, Rossello-Mora R, Busse HJ, Ludwig W, Kampfer P. 2010. Notes on the characterization of prokaryote strains for taxonomic purposes. Int. J. Syst. Evol. Microbiol. 60:249–266. 10.1099/ijs.0.016949-0 - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Benchmarking of methods for genomic taxonomy

Affiliation

Benchmarking of methods for genomic taxonomy

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources