A species-level identification pipeline for human gut microbiota based on the V3-V4 regions of 16S rRNA
- PMID: 40226098
- PMCID: PMC11985812
- DOI: 10.3389/fmicb.2025.1553124
A species-level identification pipeline for human gut microbiota based on the V3-V4 regions of 16S rRNA
Abstract
16S rRNA gene sequencing is pivotal for identifying bacterial species in microbiome studies, especially using the V3-V4 hypervariable regions. A fixed 98.5% similarity threshold is often applied for species-level identification, but this approach can cause misclassification due to varying thresholds among species. To address this, our study integrated data from SILVA, NCBI, and LPSN databases, extracting V3-V4 region sequences and supplementing them with 16S rRNA sequences from 1,082 human gut samples. This resulted in a non-redundant amplicon sequence variants (ASVs) database specific to the V3-V4 regions (positions 341-806). Utilizing this database, we identified flexible classification thresholds for 674 families, 3,661 genera, and 15,735 species, finding clear thresholds for 87.09% of families and 98.38% of genera. For the 896 most common human gut species, we established precise taxonomic thresholds. To leverage these findings, we developed the asvtax pipeline, which applies flexible thresholds for more accurate taxonomic classification, notably improving the identification of new ASVs. The asvtax pipeline not only enhances the precision of species-level classification but also provides a robust framework for analyzing complex microbial communities, facilitating more reliable ecological and functional interpretations in microbiome research.
Keywords: 16S rRNA; database abbreviations; microbiota; species-level identification; taxonomic thresholds.
Copyright © 2025 Wang, Yuan, Chen, Yang, Pu, Lin, Dong, Zhang, Yuan, Zheng, Sun and Xu.
Conflict of interest statement
WL was employed by Uniteomics Tianjin Biotechnology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures



Similar articles
-
GSR-DB: a manually curated and optimized taxonomical database for 16S rRNA amplicon analysis.mSystems. 2024 Feb 20;9(2):e0095023. doi: 10.1128/msystems.00950-23. Epub 2024 Jan 8. mSystems. 2024. PMID: 38189256 Free PMC article.
-
Primer, Pipelines, Parameters: Issues in 16S rRNA Gene Sequencing.mSphere. 2021 Feb 24;6(1):e01202-20. doi: 10.1128/mSphere.01202-20. mSphere. 2021. PMID: 33627512 Free PMC article.
-
Metataxonomic insights in the distribution of Lactobacillaceae in foods and food environments.Int J Food Microbiol. 2023 Apr 16;391-393:110124. doi: 10.1016/j.ijfoodmicro.2023.110124. Epub 2023 Feb 21. Int J Food Microbiol. 2023. PMID: 36841075
-
Determining the most accurate 16S rRNA hypervariable region for taxonomic identification from respiratory samples.Sci Rep. 2023 Mar 9;13(1):3974. doi: 10.1038/s41598-023-30764-z. Sci Rep. 2023. PMID: 36894603 Free PMC article.
-
Optimal 16S rRNA gene amplicon sequencing analysis for oral microbiota to avoid the potential bias introduced by trimming length, primer, and database.Microbiol Spectr. 2024 Oct 22;12(12):e0351223. doi: 10.1128/spectrum.03512-23. Online ahead of print. Microbiol Spectr. 2024. PMID: 39436127 Free PMC article.
Cited by
-
Beneficial bacteria-based bioformulations as potential biocontrol and biocleaning solutions for stone heritage conservation.World J Microbiol Biotechnol. 2025 Jun 14;41(6):200. doi: 10.1007/s11274-025-04446-z. World J Microbiol Biotechnol. 2025. PMID: 40514573
References
-
- Chiarello M., McCauley M., Villéger S., Jackson C. R. (2022). Ranking the biases: the choice of OTUs vs. ASVs in 16S rRNA amplicon data analysis has stronger effects on diversity measures than rarefaction and OTU identity threshold. PLoS One 17:e0264443. doi: 10.1371/journal.pone.0264443, PMID: - DOI - PMC - PubMed
LinkOut - more resources
Full Text Sources