Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Mar 1;7(3):1-7.
doi: 10.1093/gigascience/gix129.

Hybrid-denovo: a de novo OTU-picking pipeline integrating single-end and paired-end 16S sequence tags

Affiliations

Hybrid-denovo: a de novo OTU-picking pipeline integrating single-end and paired-end 16S sequence tags

Xianfeng Chen et al. Gigascience. .

Abstract

Background: Illumina paired-end sequencing has been increasingly popular for 16S rRNA gene-based microbiota profiling. It provides higher phylogenetic resolution than single-end reads due to a longer read length. However, the reverse read (R2) often has significant low base quality, and a large proportion of R2s will be discarded after quality control, resulting in a mixture of paired-end and single-end reads. A typical 16S analysis pipeline usually processes either paired-end or single-end reads but not a mixture. Thus, the quantification accuracy and statistical power will be reduced due to the loss of a large amount of reads. As a result, rare taxa may not be detectable with the paired-end approach, or low taxonomic resolution will result in a single-end approach.

Results: To have both the higher phylogenetic resolution provided by paired-end reads and the higher sequence coverage by single-end reads, we propose a novel OTU-picking pipeline, hybrid-denovo, that can process a hybrid of single-end and paired-end reads. Using high-quality paired-end reads as a gold standard, we show that hybrid-denovo achieved the highest correlation with the gold standard and performed better than the approaches based on paired-end or single-end reads in terms of quantifying the microbial diversity and taxonomic abundances. By applying our method to a rheumatoid arthritis (RA) data set, we demonstrated that hybrid-denovo captured more microbial diversity and identified more RA-associated taxa than a paired-end or single-end approach.

Conclusions: Hybrid-denovo utilizes both paired-end and single-end 16S sequencing reads and is recommended for 16S rRNA gene targeted paired-end sequencing data.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Overview and evaluation of the hybrid-denovo approach. A, hybrid-denovo illustration. B, Mantel correlation of β-diversity distance matrices (unweighted UniFrac, weighted UniFrac, and Bray-Curtis distance) with the gold standard for the 3 approaches at different percentages of good-quality R2 reads. Error bars represent standard errors of the estimate based on 100 bootstrap samples. C, Boxplot of correlations of the relative abundances of 56 prevalent genera with the gold standard.
Figure 2:
Figure 2:
Comparison of mothur, QIIME, and hybrid-denovo on genus-level profiles. Hybrid-denovo is run on data sets with different percentages of good-quality R2 reads (100%, 75%, 50%, and 25%). Each column represents the microbiota profile of an individual averaged over all replicates. The overlaps of detected genera between the 3 pipelines are shown in the Venn diagram.
Figure 3:
Figure 3:
Comparison of mothur, QIIME, and hybrid-denovo on intra-class correlation coefficients (ICCs) of the core genera (A) and OTUs (B). ICCs are calculated based on the technical replicates for 6 different fecal collection methods. Hybrid-denovo is run on data sets with different percentages of good-quality R2 reads (100%, 75%, 50%, and 25%).
Figure 4:
Figure 4:
Comparison of the R1, Paired, and Hybrid approaches on the RA data set. A, Number of detected OTUs (red) and genera (blue). B, Number of significant OTUs (red) and genera (blue) from differential abundance analysis (FDR ≤ 0.01). C, Venn diagram of the genera detected. D, Venn diagram of significant genera from differential abundance analysis.

References

    1. Cho I, Blaser MJ. The human microbiome: at the interface of health and disease. Nat Rev Genet 2012;13(4):260–70. - PMC - PubMed
    1. McDonald D, Birmingham A, Knight R. Context and the human microbiome. Microbiome 2015;3:52. - PMC - PubMed
    1. Jeraldo P, Kalari K, Chen X, Bhavsar J, Mangalam A, White B et al. IM-TORNADO: a tool for comparison of 16s reads from paired-end libraries. PLoS One 2014;9(12):e114804. - PMC - PubMed
    1. Bolger MA, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics 2014;30(15):2114–20. - PMC - PubMed
    1. Edgar RC. Search and clustering orders of magnitude faster than blast. Bioinformatics 2010;26(19):2460–1. - PubMed

Publication types

MeSH terms

Substances