Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 28:16:1553124.
doi: 10.3389/fmicb.2025.1553124. eCollection 2025.

A species-level identification pipeline for human gut microbiota based on the V3-V4 regions of 16S rRNA

Affiliations

A species-level identification pipeline for human gut microbiota based on the V3-V4 regions of 16S rRNA

Min Wang et al. Front Microbiol. .

Abstract

16S rRNA gene sequencing is pivotal for identifying bacterial species in microbiome studies, especially using the V3-V4 hypervariable regions. A fixed 98.5% similarity threshold is often applied for species-level identification, but this approach can cause misclassification due to varying thresholds among species. To address this, our study integrated data from SILVA, NCBI, and LPSN databases, extracting V3-V4 region sequences and supplementing them with 16S rRNA sequences from 1,082 human gut samples. This resulted in a non-redundant amplicon sequence variants (ASVs) database specific to the V3-V4 regions (positions 341-806). Utilizing this database, we identified flexible classification thresholds for 674 families, 3,661 genera, and 15,735 species, finding clear thresholds for 87.09% of families and 98.38% of genera. For the 896 most common human gut species, we established precise taxonomic thresholds. To leverage these findings, we developed the asvtax pipeline, which applies flexible thresholds for more accurate taxonomic classification, notably improving the identification of new ASVs. The asvtax pipeline not only enhances the precision of species-level classification but also provides a robust framework for analyzing complex microbial communities, facilitating more reliable ecological and functional interpretations in microbiome research.

Keywords: 16S rRNA; database abbreviations; microbiota; species-level identification; taxonomic thresholds.

PubMed Disclaimer

Conflict of interest statement

WL was employed by Uniteomics Tianjin Biotechnology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Composition and taxonomic distribution of the HGMAD database. (A) A Venn diagram illustrating the sources of ASVs. A total of 56,743 unique ASVs are included in the HGMAD database. Sequences are categorized by their origin: 8,103 ASVs exclusively from LPSN (blue), 2,028 from NCBI (pink), 35,552 from SILVA (yellow), and 3,602 novel sequences derived from full-length 16S rRNA sequencing of gut bacteria in 120 healthy individuals (green). Overlapping regions represent 7,458 ASVs shared by two or more databases. (B) Taxonomic relationships at the family level within the database, with different colors representing different phylum. The bar chart surrounding the circular plot displays the relative abundance of species. (C) A WordCloud representation of species-level data, where larger species names indicate a higher number of included ASVs in the database.
Figure 2
Figure 2
(A) Illustrates the types of ASVs present in different species. The red bars represent the known ASVs, the blue represents the novel ASVs within known species, and the green represents the novel ASVs within unknown species; the bar chart on the far right represents the number of ASVs for each species in the database. (B) The evolutionary relationship analysis between novel ASVs of the family Lachnospiraceae and known ASVs. Novel ASVs are represented in green, while other colors denote different genera within the family Lachnospiraceae. Novel ASVs represent previously uncatalogued microbial taxonomic units identified through full-length 16S rRNA gene sequencing of gut microbiota samples from 120 healthy individuals, demonstrating distinct phylogenetic characteristics with low sequence homology to currently classified microbial taxa in existing public databases.
Figure 3
Figure 3
Determination of taxonomic thresholds for common intestinal species. (A) The method and process for determining taxonomic thresholds. (B) The results of classification threshold determination for Lactobacillus plantarum. (C) The classification thresholds for all 896 species, where the blue bars correspond to the identity values for taxonomic thresholds, and different colors on the right represent different phylum. (D) The violin diagram at the level of family, genus and species.

Similar articles

Cited by

References

    1. Abellan-Schneyder I., Matchado M. S., Reitmeier S., Sommer A., Sewald Z., Baumbach J., et al. . (2021). Primer, pipelines, parameters: issues in 16S rRNA gene sequencing. mSphere 6:e01202. doi: 10.1128/mSphere.01202-20, PMID: - DOI - PMC - PubMed
    1. Bushnell B., Rood J., Singer E. (2017). BBMerge – accurate paired shotgun read merging via overlap. PLoS One 12:e0185056. doi: 10.1371/journal.pone.0185056, PMID: - DOI - PMC - PubMed
    1. Callahan B. J., McMurdie P. J., Holmes S. P. (2017). Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J. 11, 2639–2643. doi: 10.1038/ismej.2017.119, PMID: - DOI - PMC - PubMed
    1. Calus S. T., Ijaz U. Z., Pinto A. J. (2018). NanoAmpli-Seq: a workflow for amplicon sequencing for mixed microbial communities on the nanopore sequencing platform. GigaScience 7:140. doi: 10.1093/gigascience/giy140, PMID: - DOI - PMC - PubMed
    1. Chiarello M., McCauley M., Villéger S., Jackson C. R. (2022). Ranking the biases: the choice of OTUs vs. ASVs in 16S rRNA amplicon data analysis has stronger effects on diversity measures than rarefaction and OTU identity threshold. PLoS One 17:e0264443. doi: 10.1371/journal.pone.0264443, PMID: - DOI - PMC - PubMed

LinkOut - more resources