Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 25:15:1485073.
doi: 10.3389/fmicb.2024.1485073. eCollection 2024.

Optimizing microbiome reference databases with PacBio full-length 16S rRNA sequencing for enhanced taxonomic classification and biomarker discovery

Affiliations

Optimizing microbiome reference databases with PacBio full-length 16S rRNA sequencing for enhanced taxonomic classification and biomarker discovery

Hyejung Han et al. Front Microbiol. .

Abstract

Background: The study of the human microbiome is crucial for understanding disease mechanisms, identifying biomarkers, and guiding preventive measures. Advances in sequencing platforms, particularly 16S rRNA sequencing, have revolutionized microbiome research. Despite the benefits, large microbiome reference databases (DBs) pose challenges, including computational demands and potential inaccuracies. This study aimed to determine if full-length 16S rRNA sequencing data produced by PacBio could be used to optimize reference DBs and be applied to Illumina V3-V4 targeted sequencing data for microbial study.

Methods: Oral and gut microbiome data (PRJNA1049979) were retrieved from NCBI. DADA2 was applied to full-length 16S rRNA PacBio data to obtain amplicon sequencing variants (ASVs). The RDP reference DB was used to assign the ASVs, which were then used as a reference DB to train the classifier. QIIME2 was used for V3-V4 targeted Illumina data analysis. BLAST was used to analyze alignment statistics. Linear discriminant analysis Effect Size (LEfSe) was employed for discriminant analysis.

Results: ASVs produced by PacBio showed coverage of the oral microbiome similar to the Human Oral Microbiome Database. A phylogenetic tree was trimmed at various thresholds to obtain an optimized reference DB. This established method was then applied to gut microbiome data, and the optimized gut microbiome reference DB provided improved taxa classification and biomarker discovery efficiency.

Conclusion: Full-length 16S rRNA sequencing data produced by PacBio can be used to construct a microbiome reference DB. Utilizing an optimized reference DB can increase the accuracy of microbiome classification and enhance biomarker discovery.

Keywords: Illumina; PacBio; gut microbiome; next generation sequencing; oral microbiome; reference database.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
(A) Rarefaction curve for each oral sample. (B) Rarefaction curve for randomly combined oral samples. (C) Blast search result on Illunina V3-V4 oral microbiome data using various reference databases.
Figure 2
Figure 2
Phylogenetic trees of combined with eHOMD reference sequences and ASVs obtained from PacBio. In inner ring, colors represent phyla assigned by eHOMD. In outer ring, bar height represents number of oral samples present with the corresponding ASVs.
Figure 3
Figure 3
Phylogenetic trees of gut ASVs obtained from PacBio reads were trimmed with various thresholds. (A) Total ASVs, (B) threshold 0.0005, (C) threshold 0.001, (D) threshold 0.002, (E) threshold 0.001, (F) Blast search result on Illunina V3-V4 gut microbiome data using ASVs trimmed at various threshold as reference DB.
Figure 4
Figure 4
Bacterial community comparisons among gut sampling sites. Alpha diversity was used to describe the microbial richness and evenness within samples using the (A) Chao1 and (B) Shannon index. (C) Beta diversity of gut microbiome depending on sampling sites. Principal coordinate analysis (PCoA) of the Bray-Curtis distance was performed to determine the microbial community structure. *p < 0.05, **p < 0.01, ***p < 0.001.
Figure 5
Figure 5
Average relative abundance of microbiome depending on various reference database. (A) Genus level, (B) Bacteroides at species level. (C) Prevotella at species level.
Figure 6
Figure 6
Comparisons of microbiota among various gut sampling sites that presented significantly different depending on reference database. (A) Greengene, (B) PacBio ASVs, (C) SILVA. The analysis was performed using linear discriminant analysis (LDA) and effect size analysis. LDA score > 3.0 are displayed.

Similar articles

Cited by

References

    1. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. (1990). Basic local alignment search tool. J. Mol. Biol. 215, 403–410. doi: 10.1016/S0022-2836(05)80360-2 - DOI - PubMed
    1. Baker M. (2010). Next-generation sequencing: adjusting to data overload. Nat. Methods 7, 495–499. doi: 10.1038/nmeth0710-495 - DOI
    1. Bolyen E., Rideout J. R., Dillon M. R., Bokulich N. A., Abnet C. C., Al-Ghalith G. A., et al. . (2019). Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37, 852–857. doi: 10.1038/s41587-019-0209-9, PMID: - DOI - PMC - PubMed
    1. Booijink C. C., Zoetendal E. G., Kleerebezem M., de Vos W. M. (2007). Microbial communities in the human small intestine: coupling diversity to metagenomics. Future Microbiol. 2, 285–295. doi: 10.2217/17460913.2.3.285, PMID: - DOI - PubMed
    1. Boppana K., Almansouri N. E., Bakkannavar S., Faheem Y., Jaiswal A., Shergill K., et al. . (2024). Alterations in gut microbiota as early biomarkers for predicting inflammatory bowel disease onset and progression: a systematic review. Cureus 16:e58080. doi: 10.7759/cureus.58080, PMID: - DOI - PMC - PubMed