Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan 4;12(1):e0169563.
doi: 10.1371/journal.pone.0169563. eCollection 2017.

Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics

Affiliations

Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics

Léa Siegwald et al. PLoS One. .

Abstract

Targeted metagenomics, also known as metagenetics, is a high-throughput sequencing application focusing on a nucleotide target in a microbiome to describe its taxonomic content. A wide range of bioinformatics pipelines are available to analyze sequencing outputs, and the choice of an appropriate tool is crucial and not trivial. No standard evaluation method exists for estimating the accuracy of a pipeline for targeted metagenomics analyses. This article proposes an evaluation protocol containing real and simulated targeted metagenomics datasets, and adequate metrics allowing us to study the impact of different variables on the biological interpretation of results. This protocol was used to compare six different bioinformatics pipelines in the basic user context: Three common ones (mothur, QIIME and BMP) based on a clustering-first approach and three emerging ones (Kraken, CLARK and One Codex) using an assignment-first approach. This study surprisingly reveals that the effect of sequencing errors has a bigger impact on the results that choosing different amplified regions. Moreover, increasing sequencing throughput increases richness overestimation, even more so for microbiota of high complexity. Finally, the choice of the reference database has a bigger impact on richness estimation for clustering-first pipelines, and on correct taxa identification for assignment-first pipelines. Using emerging assignment-first pipelines is a valid approach for targeted metagenomics analyses, with a quality of results comparable to popular clustering-first pipelines, even with an error-prone sequencing technology like Ion Torrent. However, those pipelines are highly sensitive to the quality of databases and their annotations, which makes clustering-first pipelines still the only reliable approach for studying microbiomes that are not well described.

PubMed Disclaimer

Conflict of interest statement

This work was supported by the CIFRE grant n°2013/0920 from the Association Nationale de la Recherche et de la Technologie to LS. The commercial company Genes Diffusion provided support in the form of salaries for authors LS and CA. Funders did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. This does not affect or alter the authors’ adherence to all of PLOS ONE's policies on sharing data and materials.

Figures

Fig 1
Fig 1. Distinctions between clustering-first and assignment-first approaches.
A question mark indicates an unclassified read and/or taxon.
Fig 2
Fig 2. Schematic overview of the evaluation protocol.
Fig 3
Fig 3. Comparison of F-measures between the 200(V3) and 400(V4-V5) amplicon at the family level (left) and at the genus level (right) on the HC 50k dataset with error simulation.
Fig 4
Fig 4. Comparison of F-measures (top) and richness error (bottom) in the error-free and error-prone sequencing models on the 200(V3) HC 50k dataset.
Fig 5
Fig 5. Comparison of the Chao1 error percentage on the 200(V3) HC dataset with sequencing errors simulation considering 25k, 50k and 100k reads, at the family and genus level, after taxonomic merging.
Fig 6
Fig 6. Proportions of the top 10 families per pipeline on the LC, MC and HC 50k 200(V3) with error simulation datasets, and their matching 1-NID clustering indexes (computed after taxonomic merging) at the genus and family levels.
Fig 7
Fig 7. F-measure and richness index error percentage after taxonomic merging for each pipeline on the 200(V3) 50k HC dataset with error simulation, when using different databases (the recommended database for each pipeline is marked with *).
Fig 8
Fig 8. Proportions of the top 10 families per pipeline on a real dataset, and their matching Chao1 diversity indexes (computed after taxonomic merging) at the family level.
Below, average linkage hierarchical clustering of all pipelines based on a Euclidean distance calculation on the amount on all reads per family per pipeline (excluding unclassified reads). Pipelines are marked with a * when executed with their default database.
Fig 9
Fig 9. Histogram of wall time (colored) and CPU time (white), and peak memory usage (red crosses) for each standalone pipeline on three different datasets.
Fig 10
Fig 10. Performance summary of each pipeline (default databases) when varying different parameters.
Colored disks represent how each pipeline handles specific variables (red cross = bad, green check = good, no disk = no major impact).

References

    1. Simon C, Daniel R. Metagenomic analyses: Past and future trends. Appl Environ Microbiol 2011;77:1153–61. 10.1128/AEM.02345-10 - DOI - PMC - PubMed
    1. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 2010;464:59–65. 10.1038/nature08821 - DOI - PMC - PubMed
    1. Delcenserie V, Taminiau B, Delhalle L, Nezer C, Doyen P, Crevecoeur S, et al. Microbiota characterization of a Belgian protected designation of origin cheese, Herve cheese, using metagenomic analysis. J Dairy Sci 2014;97:6046–56. 10.3168/jds.2014-8225 - DOI - PubMed
    1. Gilbert JA, Jansson JK, Knight R, Gewin V, Gilbert J, Meyer F, et al. The Earth Microbiome project: successes and aspirations. BMC Biol. 2014;12: 69 10.1186/s12915-014-0069-1 - DOI - PMC - PubMed
    1. Kopf A, Bicak M, Kottmann R, Schnetzer J, Kostadinov I, Lehmann K, et al. The ocean sampling day consortium. Gigascience 2015;4: 27 10.1186/s13742-015-0066-5 - DOI - PMC - PubMed

LinkOut - more resources