Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Apr 7:2:11.
doi: 10.1186/2049-2618-2-11. eCollection 2014.

CopyRighter: a rapid tool for improving the accuracy of microbial community profiles through lineage-specific gene copy number correction

Affiliations

CopyRighter: a rapid tool for improving the accuracy of microbial community profiles through lineage-specific gene copy number correction

Florent E Angly et al. Microbiome. .

Abstract

Background: Culture-independent molecular surveys targeting conserved marker genes, most notably 16S rRNA, to assess microbial diversity remain semi-quantitative due to variations in the number of gene copies between species.

Results: Based on 2,900 sequenced reference genomes, we show that 16S rRNA gene copy number (GCN) is strongly linked to microbial phylogenetic taxonomy, potentially under-representing Archaea in amplicon microbial profiles. Using this relationship, we inferred the GCN of all bacterial and archaeal lineages in the Greengenes database within a phylogenetic framework. We created CopyRighter, new software which uses these estimates to correct 16S rRNA amplicon microbial profiles and associated quantitative (q)PCR total abundance. CopyRighter parses microbial profiles and, because GCN estimates are pre-computed for all taxa in the reference taxonomy, rapidly corrects GCN bias. Software validation with in silico and in vitro mock communities indicated that GCN correction results in more accurate estimates of microbial relative abundance and improves the agreement between metagenomic and amplicon profiles. Analyses of human-associated and anaerobic digester microbiomes illustrate that correction makes tangible changes to estimates of qPCR total abundance, α and β diversity, and can significantly change biological interpretation. For example, human gut microbiomes from twins were reclassified into three rather than two enterotypes after GCN correction.

Conclusions: The CopyRighter bioinformatic tools permits rapid correction of GCN in microbial surveys, resulting in improved estimates of microbial abundance, α and β diversity.

PubMed Disclaimer

Figures

Figure 1
Figure 1
CopyRighter flowchart for the correction of microbial amplicon datasets. (A) Pre-computation of genome copy number (GCN) based on a tree-based taxonomy and reference genomes. (B) The processing of microbial data through off-the-shelf programs. (C) The correction of microbial data to estimate relative abundance, absolute abundance and average GCN in given samples. OTU, operational taxonomic unit; qPCR, quantitative polymerase chain reaction; rRNA, ribosomal RNA.
Figure 2
Figure 2
Estimated gene copy number (GCN) of 274 Greengenes taxa represented by over 2,900 genomes. (A) Density plot of GCN. (B) Distribution of GCN from phylum to order level.
Figure 3
Figure 3
Boxplot of the accuracy of CopyRighter's correction based on the composition of 16S rRNA gene amplicon mock datasets at the genus level. (A-C)In silico uneven Grinder datasets of varying richness, and (D,E) published in vitro mock datasets. The boxes represent the minimum, maximum, median and interquartile range. The smaller the distance, the closer the observed profile is to the expected profile. Corrected profiles with a significantly lower distance than the corresponding uncorrected profiles (unilateral exact Mann–Whitney test, P < 0.05) are marked with a star.
Figure 4
Figure 4
Phylum-level effects of gene copy number bias correction in 280 human gut microbiomes. (A) Uncorrected, (B) after phylogenetic-level correction, and (C) difference in Berger-Parker α diversity index between the corrected and non-corrected samples. Samples in all panels were sorted by increasing Berger-Parker difference.
Figure 5
Figure 5
Optimal number of enterotypes based on partition around medoid clustering of microbial profiles of the twin cohort at the genus level. (A) Non-corrected samples, and (B) samples processed through CopyRighter. The optimal number of enterotypes is shaded and represents the number of clusters with the largest average silhouette width (top panels) and Calinksi-Harabasz index (bottom panels).
Figure 6
Figure 6
Abundance of microbial orders in replicate anaerobic digesters. (A) Before and (B) after phylogenetic-level correction of relative and total abundance. P values from unilateral paired t-tests are indicated, and marked with a star when significant (P < 0.05).

References

    1. Morgan JL, Darling AE, Eisen JA. Metagenomic sequencing of an in vitro-simulated microbial community. PLoS ONE. 2010;5:e10209. doi: 10.1371/journal.pone.0010209. - DOI - PMC - PubMed
    1. Yuan S, Cohen DB, Ravel J, Abdo Z, Forney LJ. Evaluation of methods for the extraction and purification of DNA from the human microbiome. PLoS One. 2012;7:e33865. doi: 10.1371/journal.pone.0033865. - DOI - PMC - PubMed
    1. Pinto AJ, Raskin L. PCR biases distort bacterial and archaeal community structure in pyrosequencing datasets. PLoS One. 2012;7:e43093. doi: 10.1371/journal.pone.0043093. - DOI - PMC - PubMed
    1. Farrelly V, Rainey FA, Stackebrandt E. Effect of genome size and rrn gene copy number on PCR amplification of 16S rRNA genes from a mixture of bacterial species. Appl Environ Microbiol. 1995;61:2798–2801. - PMC - PubMed
    1. Acinas SG, Marcelino LA, Klepac-Ceraj V, Polz MF. Divergence and redundancy of 16S rRNA sequences in genomes with multiple rrn operons. J Bacteriol. 2004;186:2629–2635. doi: 10.1128/JB.186.9.2629-2635.2004. - DOI - PMC - PubMed