Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Feb 3;10(2):e0116106.
doi: 10.1371/journal.pone.0116106. eCollection 2015.

16S classifier: a tool for fast and accurate taxonomic classification of 16S rRNA hypervariable regions in metagenomic datasets

Affiliations

16S classifier: a tool for fast and accurate taxonomic classification of 16S rRNA hypervariable regions in metagenomic datasets

Nikhil Chaudhary et al. PLoS One. .

Abstract

The diversity of microbial species in a metagenomic study is commonly assessed using 16S rRNA gene sequencing. With the rapid developments in genome sequencing technologies, the focus has shifted towards the sequencing of hypervariable regions of 16S rRNA gene instead of full length gene sequencing. Therefore, 16S Classifier is developed using a machine learning method, Random Forest, for faster and accurate taxonomic classification of short hypervariable regions of 16S rRNA sequence. It displayed precision values of up to 0.91 on training datasets and the precision values of up to 0.98 on the test dataset. On real metagenomic datasets, it showed up to 99.7% accuracy at the phylum level and up to 99.0% accuracy at the genus level. 16S Classifier is available freely at http://metagenomics.iiserb.ac.in/16Sclassifier and http://metabiosys.iiserb.ac.in/16Sclassifier.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Optimization of parameters using hypervariable region V3.
(a) OOB error at different mtry values for 2-mer, 3-mer, 4-mer, 5-mer and 6-mer models, (b) Effect of k-mer size on time required for the calculation, (c) Size of the input file (used for training) for different k-mer size. From the figure (a), it is apparent that the OOB error for 2-mer and 3-mer models was higher as compared to 4-mer, 5-mer and 6-mer models. The figures (b) and (c) show that the time taken and the training data size were several fold higher for 5-mer and 6-mer models as compared to the 4-mer model.
Figure 2
Figure 2. OOB error shows a slight increase on removing variables.
The optimizations were carried out using hypervariable region V3, 4-mer as input and mtry = 8 (The values of these parameters were selected from the Fig. 1).
Figure 3
Figure 3. Decrease in OOB error for was observed on increasing the number of trees (ntree) at mtry = 8.
This optimization was carried out using hypervariable region V3, 4-mer as input variable, mtry = 8 and 256 variables
Figure 4
Figure 4. OOB error decreases on increasing the number of trees (ntree) at optimum mtry for different HVRs.
For all individual hypervariable region regions mtry value was optimized separately (using 4-mer as input) and was used for constructing the model at ntree = 1000. V2_mtry8 represents hypervariable region V2 at optimum mtry 8, and similarly represented for other hypervariable regions.

References

    1. Thomas T, Gilbert J, Meyer F (2012) Metagenomics—a guide from sampling to data analysis. Microb Inform Exp 2: 3 10.1186/2042-5783-2-3 - DOI - PMC - PubMed
    1. Wooley JC, Godzik A, Friedberg I (2010) A primer on metagenomics. PLoS computational biology 6: e1000667 10.1371/journal.pcbi.1000667 - DOI - PMC - PubMed
    1. Fuhrman JA (2012) Metagenomics and its connection to microbial community organization. F1000 Biol Rep 4: 15 10.3410/B4-15 - DOI - PMC - PubMed
    1. Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, et al. (2004) Environmental genome shotgun sequencing of the Sargasso Sea. science 304: 66–74. 10.1126/science.1093857 - DOI - PubMed
    1. Janda JM, Abbott SL (2007) 16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: pluses, perils, and pitfalls. Journal of Clinical Microbiology 45: 2761–2764. 10.1128/JCM.01228-07 - DOI - PMC - PubMed

Publication types

Substances