Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Sep 29:7:28934.
doi: 10.3402/jom.v7.28934. eCollection 2015.

Robust species taxonomy assignment algorithm for 16S rRNA NGS reads: application to oral carcinoma samples

Affiliations

Robust species taxonomy assignment algorithm for 16S rRNA NGS reads: application to oral carcinoma samples

Nezar Noor Al-Hebshi et al. J Oral Microbiol. .

Abstract

Background: Usefulness of next-generation sequencing (NGS) in assessing bacteria associated with oral squamous cell carcinoma (OSCC) has been undermined by inability to classify reads to the species level.

Objective: The purpose of this study was to develop a robust algorithm for species-level classification of NGS reads from oral samples and to pilot test it for profiling bacteria within OSCC tissues.

Methods: Bacterial 16S V1-V3 libraries were prepared from three OSCC DNA samples and sequenced using 454's FLX chemistry. High-quality, well-aligned, and non-chimeric reads ≥350 bp were classified using a novel, multi-stage algorithm that involves matching reads to reference sequences in revised versions of the Human Oral Microbiome Database (HOMD), HOMD extended (HOMDEXT), and Greengene Gold (GGG) at alignment coverage and percentage identity ≥98%, followed by assignment to species level based on top hit reference sequences. Priority was given to hits in HOMD, then HOMDEXT and finally GGG. Unmatched reads were subject to operational taxonomic unit analysis.

Results: Nearly, 92.8% of the reads were matched to updated-HOMD 13.2, 1.83% to trusted-HOMDEXT, and 1.36% to modified-GGG. Of all matched reads, 99.6% were classified to species level. A total of 228 species-level taxa were identified, representing 11 phyla; the most abundant were Proteobacteria, Bacteroidetes, Firmicutes, Fusobacteria, and Actinobacteria. Thirty-five species-level taxa were detected in all samples. On average, Prevotella oris, Neisseria flava, Neisseria flavescens/subflava, Fusobacterium nucleatum ss polymorphum, Aggregatibacter segnis, Streptococcus mitis, and Fusobacterium periodontium were the most abundant. Bacteroides fragilis, a species rarely isolated from the oral cavity, was detected in two samples.

Conclusion: This multi-stage algorithm maximizes the fraction of reads classified to the species level while ensuring reliable classification by giving priority to the human, oral reference set. Applying the algorithm to OSCC samples revealed high diversity. In addition to oral taxa, a number of human, non-oral taxa were also identified, some of which are rarely detected in the oral cavity.

Keywords: OSCC; bacteria; cancer; next-generation sequencing; pyrosequencing; taxonomy.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Prioritized, multi-stage, BLASTN search algorithm used for taxonomic assignment of the reads. Refer to the text for a description. Updated-HOMD 13.2, Human Oral Microbiome Database version 13.2 updated by removal of potential chimeric sequences and addition of new taxa; trusted-HOMDEXT, HOMD extended after clearing chimeric and redundant sequences; modified-GGG: Greengene Gold collection after removing unaligned and redundant sequences.
Fig. 2
Fig. 2
Relative abundance (%) of 11 phyla detected in the OSCC samples. GN02, Synergistetes, and Tenericutes were found in single samples.
Fig. 3
Fig. 3
Distribution of the detected phyla in each of the study OSCC samples.
Fig. 4
Fig. 4
Relative abundance (%) of 29 genera detected in all OSCC samples. Abundance of Haemophilus was inflated by the presence of high level of H. influenzae in the sample from case 3 (see Fig. 5).
Fig. 5
Fig. 5
Distribution of 16 genera accounting for >80% of the reads in each of the OSCC sample. Profiles of cases 1 and 2 are comparable, while that of case 3 deviates significantly due to high levels of Haemophilus.
Fig. 6
Fig. 6
Subject-level distribution of 35 species-level taxa identified in all the three OSCC samples. *Veillonella parvula group: V. parvula, V. dispar and V. rogosae.

References

    1. Meurman JH. Oral microbiota and cancer. J Oral Microbiol. 2010;2 5195, doi: http://dx.doi.org/10.3402/jom.v2i0.5195. - DOI - PMC - PubMed
    1. Peter S, Beglinger C. Helicobacter pylori and gastric cancer: the causal relationship. Digestion. 2007;75:25–35. - PubMed
    1. Markowska J, Fischer N, Markowski M, Nalewaj J. The role of Chlamydia trachomatis infection in the development of cervical neoplasia and carcinoma. Med Wieku Rozwoj. 2005;9:83–6. - PubMed
    1. Nagaraja V, Eslick GD. Systematic review with meta-analysis: the relationship between chronic Salmonella typhi carrier status and gall-bladder cancer. Aliment Pharmacol Ther. 2014;39:745–50. - PubMed
    1. Toprak NU, Yagci A, Gulluoglu BM, Akin ML, Demirkalem P, Celenk T, et al. A possible role of Bacteroides fragilis enterotoxin in the aetiology of colorectal cancer. Clin Microbiol Infect. 2006;12:782–6. - PubMed

LinkOut - more resources